Spectral clustering of high-dimensional data via Nonnegative Matrix Factorization

2015 
Spectral clustering has become a popular subspace clustering algorithm in machine learning and data mining, which aims at finding a low-dimensional representation by utilizing the spectrum of a Laplacian matrix. It is a key to construct a discriminative and reliable affinity matrix for spectral clustering to achieve impressive clustering quality. As the real word data increase with higher dimension of features and larger number of data samples, it is a challenge to construct a good affinity matrix. Recently, sparse representation based spectral clustering (SRSC) has proven its efficiency for clustering and lead to promising clustering results in high-dimensional data. SRSC constructs affinity matrix by using sparse representation coefficient vectors. However, it is very time consuming. Additionally, the dimension of the sparse coefficient vector is equal to the number of samples, which may make the affinity matrix not discriminative enough. Therefore, it is inefficient to apply SRSC in clustering large scale datasets. To remedy these issues, we propose a new spectral clustering algorithm which constructs affinity matrix via Nonnegative Matrix Factorization (NMF) coefficient vectors. We call our algorithm as NMF based spectral clustering (NMFSC). The dimension of NMF coefficient vector is independent on the number of the samples and significantly smaller than that of sparse coefficient vector. Therefore, the affinity matrix can be constructed via NMF coefficient vector with much lower computational cost. The experimental results on several public gene expression profiling (GEP) datasets demonstrate the advantage of NMF coefficient over sparse representation coefficient and suggest that NMFSC is promising in clustering high-dimensional data.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    28
    References
    5
    Citations
    NaN
    KQI
    []