Selection of K in K-means clustering
2005
AbstractThe K-means algorithm is a popular data-clustering algorithm. However, one of its drawbacks is the requirement for the number of clusters, K, to be specified before the algorithm is applied. This paper first reviews existing methods for selecting the number of clusters for the algorithm. Factors that affect this selection are then discussed and a new measure to assist the selection is proposed. The paper concludes with an analysis of the results of using the proposed measure to determine the number of clusters for the K-means algorithm for different data sets.
Keywords:
- k-medians clustering
- Correlation clustering
- Population-based incremental learning
- Nearest-neighbor chain algorithm
- Determining the number of clusters in a data set
- FSA-Red Algorithm
- Machine learning
- CURE data clustering algorithm
- Canopy clustering algorithm
- Mathematics
- Pattern recognition
- Artificial intelligence
- Affinity propagation
- Data mining
- Correction
- Source
- Cite
- Save
- Machine Reading By IdeaReader
29
References
350
Citations
NaN
KQI