Improving K-Means Effectiveness and Efficiency with Initialization Estimates of Cluster Centroids

2021 
K-Means is known both for its usefulness in finding clusters of related data as well as its fragility with respect to initialization choices. This paper introduces a 95% more effective and 50% more efficient initialization methods, that could eliminate the need for multiple executions of K-Means to find high quality clustering. To initialize the centroids, it selects a multiple, m, of K real data points, computes (mK)2 distances and keeps only the K maximum( minimum( distance ) ) points. A consequence of this technique enables O(lnK) binary search to find the optimal K on ’linearly’ separable clusters. The effectiveness claim applies both to separable and intertwined clusters although the efficiency is lost on intertwined clusters.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    9
    References
    0
    Citations
    NaN
    KQI
    []