Improving K-Means Effectiveness and Efficiency with Initialization Estimates of Cluster Centroids

Rajesh Kumar Ojha,Sandeep Kumar Srivastava,Mohit Goyal,Lalan Kumar,Amit Kumar,Chitturi Prasad

Improving K-Means Effectiveness and Efficiency with Initialization Estimates of Cluster Centroids

2021

Rajesh Kumar Ojha
Sandeep Kumar Srivastava
Mohit Goyal
Lalan Kumar
Amit Kumar
Chitturi Prasad

K-Means is known both for its usefulness in finding clusters of related data as well as its fragility with respect to initialization choices. This paper introduces a 95% more effective and 50% more efficient initialization methods, that could eliminate the need for multiple executions of K-Means to find high quality clustering. To initialize the centroids, it selects a multiple, m, of K real data points, computes (mK)2 distances and keeps only the K maximum( minimum( distance ) ) points. A consequence of this technique enables O(lnK) binary search to find the optimal K on ’linearly’ separable clusters. The effectiveness claim applies both to separable and intertwined clusters although the efficiency is lost on intertwined clusters.

Keywords:

Algorithm
Centroid
Initialization
Cluster (physics)
Computer science
k-means clustering

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations