Fast and stable clustering analysis based on Grid-mapping K-means algorithm and new clustering validity index

2019 
Abstract As a classical data analysis technique, clustering plays the important role in identifying natural structures of target datasets. However, many of the existing clustering methods, including clustering algorithms and clustering validity indexes (CVIs), are still suffering from problems of low efficiency, poor clustering accuracy, poor stability and more sensitivity to noise points. In this paper, by mapping datasets to grids, the Grid-K-means algorithm is firstly proposed to overcome drawbacks of the traditional K-means algorithm. Then, by utilizing grid points as the weighted representative points to process datasets, a new clustering validity index (BCVI) is designed to better evaluate the quality of clustering results generated by the Grid-K-means algorithm. Based on the monotonous feature of BCVI and the linear combination of intra-cluster compactness and inter-cluster separation of clusters, BCVI consumes much lower time cost in finding the optimal clustering number ( K opt ) than the commonly used method that utilizes the empirical rule K max ≤ n to calculate the K opt . Experimental results on testing many types of datasets have demonstrated that the Grid-K-means algorithm is faster and more accurate than the traditional ones. Meanwhile, the experimental results on testing BCVI and seven existing CVIs have shown that the new BCVI is superior to the traditional ones in terms of clustering stability and data processing speed.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    30
    References
    22
    Citations
    NaN
    KQI
    []