Revisiting randomized choices in isolation forests

10/26/2021
by   David Cortes, et al.
0

Isolation forest or "iForest" is an intuitive and widely used algorithm for anomaly detection that follows a simple yet effective idea: in a given data distribution, if a threshold (split point) is selected uniformly at random within the range of some variable and data points are divided according to whether they are greater or smaller than this threshold, outlier points are more likely to end up alone or in the smaller partition. The original procedure suggested the choice of variable to split and split point within a variable to be done uniformly at random at each step, but this paper shows that "clustered" diverse outliers - oftentimes a more interesting class of outliers than others - can be more easily identified by applying a non-uniformly-random choice of variables and/or thresholds. Different split guiding criteria are compared and some are found to result in significantly better outlier discrimination for certain classes of outliers.

READ FULL TEXT

page 6

page 7

page 8

page 10

research
11/23/2021

Isolation forests: looking beyond tree depth

The isolation forest algorithm for outlier detection exploits a simple y...
research
12/05/2022

AIDA: Analytic Isolation and Distance-based Anomaly Detection Algorithm

We combine the metrics of distance and isolation to develop the Analytic...
research
11/06/2018

Extended Isolation Forest

We present an extension to the model-free anomaly detection algorithm, I...
research
10/27/2019

Distance approximation using Isolation Forests

This work briefly explores the possibility of approximating spatial dist...
research
12/12/2017

Outlier Detection by Consistent Data Selection Method

Often the challenge associated with tasks like fraud and spam detection[...
research
11/05/2019

Detecting Point Outliers Using Prune-based Outlier Factor (PLOF)

Outlier detection (also known as anomaly detection or deviation detectio...
research
07/15/2019

Sequential online prediction in the presence of outliers and change points: an instant temporal structure learning approach

In this paper, we consider sequential online prediction (SOP) for stream...

Please sign up or login with your details

Forgot password? Click here to reset