Fast Cluster Tendency Assessment for Big, High-Dimensional Data

2021 
Assessment of clustering tendency is an important first step in crisp or fuzzy cluster analysis. One tool for assessing cluster tendency is the Visual Assessment of Tendency (VAT) algorithm. The VAT and improved VAT (iVAT) algorithms have been successful in determining potential cluster structure in the form of visual images for various datasets, but they can be computationally expensive for datasets with a very large number of samples and/or dimensions. Scalable versions of VAT/iVAT, such as sVAT/siVAT, have been proposed for iVAT approximation, but they also take a lot of time when the data is large both in the number of records and dimensions. In this chapter, we introduce two new algorithms to obtain approximate iVAT images that can be used to visually estimate the potential number of clusters in big data. We compare the two proposed methods with the original version of siVAT on five large, high-dimensional datasets, and demonstrate that both new methods provide visual evidence about potential cluster structure in these datasets in significantly less time than siVAT with no apparent loss of accuracy or visual acuity.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    18
    References
    0
    Citations
    NaN
    KQI
    []