# Local Outlier Factor

Local outlier Factor (LoF) is another density based approach to identify outliers in a dataset. The LoF is applicable to identify outliers in a dataset, which has a mixture of data distributions.

The above figure shows two different distributions, a dense cluster of points and a sparse distribution of points.  In such datasets, for each specific distribution within a dataset, we should perform outlier detection locally, i.e., points within one distribution should not affect outlier detection in another cluster. The LoF algorithm follows the same intuition and calculates anomaly score for  each point within a distribution as:

1. For each data point $X$, let $D^k(X)$ represent distance of point $X$ to its $k^{th}$ neighbor, and $L_{k}(X)$ represent set of points within $D^k(X)$
2. Compute reachability distance for each data point, $X$  as                                     $R_{k}(X, Y) = max(dist(X,Y), D^k(Y))$
3. Compute Average reachability distance $AR_{k}(X)$ of data point $X$ as                         $AR_{k}(X) = MEAN_{Y \in L_{k}(X)} R_{k}(X, Y)$
4. In the final step, LOF score for each point, $X$ is calculated as:                                                            $LOF_{k}(X) = MEAN_{Y \in L_{k}(X)} \frac{AR_{k}(X)}{AR_{k}(Y)}$

To find the best value of $k$, it is always good to follow ensemble approach, i.e., use a range of $k$ values to calculate LOF scores and then use a specific method to combine the outlier scores.

References:

1. Book: Outlier Analysis by Charu Aggarwal
2.  Wikipedia