Indexes to Find the Optimal Number of Clusters in a Hierarchical Clustering.

José David Martín-Fernández,José María Luna-Romera,Beatriz Pontes,José Cristóbal Riquelme Santos

Indexes to Find the Optimal Number of Clusters in a Hierarchical Clustering.

2019

Clustering analysis is one of the most commonly used techniques for uncovering patterns in data mining. Most clustering methods require establishing the number of clusters beforehand. However, due to the size of the data currently used, predicting that value is at a high computational cost task in most cases. In this article, we present a clustering technique that avoids this requirement, using hierarchical clustering. There are many examples of this procedure in the literature, most of them focusing on the dissociative or descending subtype, while in this article we cover the agglomerative or ascending subtype. Being more expensive in computational and temporal cost, it nevertheless allows us to obtain very valuable information, regarding elements membership to clusters and their groupings, that is to say, their dendrogram. Finally, several sets of data have been used, varying their dimensionality. For each of them, we provide the calculations of internal validation indexes to test the algorithm developed, studying which of them provides better results to obtain the best possible clustering.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations