Probabilistic density-based estimation of the number of clusters using the DBSCAN-martingale process

2019 
Abstract Density-based clustering is an effective clustering approach that groups together dense patterns in low- and high-dimensional vectors, especially when the number of clusters is unknown. Such vectors are obtained for example when computer scientists represent unstructured data and then groups them into clusters in an unsupervised way. Another facet of clustering similar artifacts is the detection of densely connected nodes in network structures, where communities of nodes are formulated and need to be identified. To that end, we propose a new DBSCAN algorithm for estimating the number of clusters by optimizing a probabilistic process, namely DBSCAN-Martingale, which involves randomness in the selection of density parameter. We minimize the number of iterations required to extract all clusters by the DBSCAN-Martingale process, by providing an analytic formula. Experiments on spatial, textual and visual clustering show that the proposed analytic formula provides a suitable indicator for the optimal number of required iterations to extract all clusters.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    42
    References
    6
    Citations
    NaN
    KQI
    []