An Improved Numerical DBSCAN Algorithm Based on Non-IIDness Learning

2021 
In clustering algorithm research, objects, attributes and other aspects of data sets are usually considered to be independent and identically distributed; that is, each object is assumed to be an independent and uniformly distributed individual with no impacts between objects. However, objects in real life are often neither independently nor identically distributed; that is, they are non-IID, leading to a complex coupling relationship between objects, and objects interact with each other. The results of a clustering algorithm under an independent and identical distribution may be incomplete or even misleading. To make the results of the DBSCAN algorithm as accurate as possible, an improved numerical DBSCAN algorithm based on non-IIDness learning is proposed in this paper. The algorithm calculates the coupling relationship between objects to obtain the potential relationship between objects and determines the parameters Eps and MinPts by the distribution characteristics of the data. Experiments on large-scale real and synthetic data sets show that the algorithm achieves a higher accuracy than the original DBSCAN algorithm and the main algorithms that improved upon it.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    33
    References
    1
    Citations
    NaN
    KQI
    []