Selection of K in K-means clustering

Duc Truong Pham,Stefan Simeonov Dimov,C.D. Nguyen

Selection of K in K-means clustering

2005

Duc Truong Pham
Stefan Simeonov Dimov
C.D. Nguyen

AbstractThe K-means algorithm is a popular data-clustering algorithm. However, one of its drawbacks is the requirement for the number of clusters, K, to be specified before the algorithm is applied. This paper first reviews existing methods for selecting the number of clusters for the algorithm. Factors that affect this selection are then discussed and a new measure to assist the selection is proposed. The paper concludes with an analysis of the results of using the proposed measure to determine the number of clusters for the K-means algorithm for different data sets.

Keywords:

k-medians clustering
Correlation clustering
Population-based incremental learning
Nearest-neighbor chain algorithm
Determining the number of clusters in a data set
FSA-Red Algorithm
Machine learning
CURE data clustering algorithm
Canopy clustering algorithm
Mathematics
Pattern recognition
Artificial intelligence
Affinity propagation
Data mining

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

350

Citations