DBSCOUT: A Density-based Method for Scalable Outlier Detection in Very Large Datasets

Matteo Corain,Paolo Garza,Abolfazl Asudeh

DBSCOUT: A Density-based Method for Scalable Outlier Detection in Very Large Datasets

2021

Recent technological advancements have enabled generating and collecting huge amounts of data in a daily manner. This data is used for different purposes that may impact us on an unprecedented scale. Understanding the data, including detecting its outliers, is a critical step before utilizing it.Outlier detection has been studied well in the literature but the existing approaches fail to scale to these very large settings. In this paper, we propose DBSCOUT, an efficient exact algorithm for outlier detection with a linear complexity that can run in parallel over multiple independent machines, making it a fit for the settings with billions of tuples. Besides the theoretical analysis, our experiment results confirm orders of magnitude improvement over the existing work, proving the efficiency, scalability, and effectiveness of our approach.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations