Sub-Grid Partitioning Algorithm for Distributed Outlier Detection on Big Data

2018 
Anomaly detection or outlier detection has become a major research problem in the era of big data. It is used in many applications, remove noise from signals and in credit card fraud detection. One type of outlier detection is Density-based outlier detection. Its major uniqueness is in detecting outlier points in different densities. One of the algorithms that are based on density based outlier detection is Local Outlier Factor (LOF). LOF gives every point a score that identifies its outlierness compared to other points. In this paper, we propose a new algorithm called sub-Grid partition (SGP) algorithm. SGP algorithm helps in calculating the LOF for Big Data in a distributed environment. SGP algorithm splits the tuples into small grids each grid is splitted into sub-grids. Sub-grids in the border are duplicated in every processing node for calculating the LOF for every tuple in these grids. Duplication of sub-grids lead to increase in the number of tuples that will be processed but in the other hand reduces the network overhead required for communication between processing nodes and reducing processing node idle time waiting for the requested tuple. In the end, we evaluate the performance of the SGP algorithm through a series of simulation experiments over real data sets.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    6
    References
    3
    Citations
    NaN
    KQI
    []