Using Regression Error Analysis and Feature Selection to Automatic Cluster Labeling

2021 
Cluster Labeling Models apply Artificial Intelligence techniques to extract the key features of clustered data to provide a tool for clustering interpretation. For this purpose, we applied different techniques such as Classification, Regression, Fuzzy Logic, and Data Discretization to identify essential attributes for cluster formation and the ranges of values associated with them. This paper presents an improvement to the Regression-based Cluster Labeling Model that integrates to the model an attribute selection step based on the coefficient of determination obtained by regression models in order to make its application possible in large datasets. The model was tested on the literature datasets Iris, Breast Cancer, and Parkinson’s Disease, evaluating the labeling performance of different dimensionality. The results obtained from the experiments showed that the model is sound, providing specific labels for each cluster representing between 99% and 100% of the elements of the clusters for the datasets used.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    9
    References
    0
    Citations
    NaN
    KQI
    []