Evaluating the numerical instability in fuzzy clustering validation of high-dimensional data

2019 
Abstract Fuzzy clustering validation of high-dimensional datasets is only possible using a reliable cluster validity index (CVI). A good CVI must correctly recognize a data structure and its validations must be independently of any parameter of a clustering algorithm or data property. However, some classical fuzzy CVIs as Partition Coefficient (PC), Partition Entropy (PE) and Fukuyama-Sugeno (FS) have the monotonic tendency in function of the number of clusters. Although the literature presents extensive investigations about such tendency, they were conducted for low-dimensional data, in which such data property does not affect the clustering behavior. In order to investigate how such aspects affect the fuzzy clustering results of high-dimensional data, in this work we have clustered objects of thirteen real datasets, using the Fuzzy c-Means algorithm. The fuzzy partitions were validated by PC, PE, FS and some proposed improvements of them to lead with the monotonic tendency, totaling eight fuzzy CVIs analyzed. Besides the analysis made about the number of clusters selected by the CVIs, the Mann-Kendall test was performed to verify statistically the monotonic trend of the CVIs results. From the two analysis made, the Modified Partition Coefficient and Scaled Partition Entropy indices were successful in respectively improving the PC and PE indices.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    36
    References
    2
    Citations
    NaN
    KQI
    []