Improving K-Means Effectiveness and Efficiency with Initialization Estimates of Cluster Centroids
2021
K-Means is known both for its usefulness in finding clusters of related data as well as its fragility with respect to initialization choices. This paper introduces a 95% more effective and 50% more efficient initialization methods, that could eliminate the need for multiple executions of K-Means to find high quality clustering. To initialize the centroids, it selects a multiple, m, of K real data points, computes (mK)2 distances and keeps only the K maximum( minimum( distance ) ) points. A consequence of this technique enables O(lnK) binary search to find the optimal K on ’linearly’ separable clusters. The effectiveness claim applies both to separable and intertwined clusters although the efficiency is lost on intertwined clusters.
- Correction
- Source
- Cite
- Save
- Machine Reading By IdeaReader
9
References
0
Citations
NaN
KQI