Improving the Clustering Algorithms Automatic Generation Process with Cluster Quality Indexes.

2020 
AutoClustering is a computational tool for the automatic generation of clustering algorithms, which combines and evaluates the main parts of density-based algorithms to generate more appropriate solutions for a given dataset for clustering tasks. AutoClustering uses the Estimation of Distribution Algorithms (EDA) evolutionary technique to create the algorithms (individuals), and the adapted CLEST method (originally determines the best number of groups for a dataset) to compute individual fitness, using a decision-tree classifier. Thus, as the motivation to improve the quality of the results generated by AutoClustering, and to avoid possible bias by the adoption of a classifier, this work proposes to increase the efficiency of the evaluation process by the addition of a quality metric based on a fusion of three quality indexes of solution clusters. The three quality indexes are Silhouette, Dunn, and Davies-Bouldin, which assess the situation Intra and Inter clusters, with algorithms based on distance and independent of the generation of the groups. A final score for a specific solution (algorithm + parameters) is the average of normalized quality metric and normalized fitness. Besides, the results of the proposal presented solutions with higher cluster quality metrics, higher fitness average, and higher diversity of generated individuals (clustering algorithms) when compared with traditional Autocluestering.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    26
    References
    0
    Citations
    NaN
    KQI
    []