Large-scale subspace clustering by fast regression coding

2017 
Large-Scale Subspace Clustering (LSSC) is an interesting and important problem in big data era. However, most existing methods (i.e., sparse or low-rank subspace clustering) cannot be directly used for solving LSSC because they suffer from the high time complexity-quadratic or cubic in n (the number of data points). To overcome this limitation, we propose a Fast Regression Coding (FRC) to optimize regression codes, and simultaneously train a nonlinear function to approximate the codes. By using FRC, we develop an efficient Regression Coding Clustering (RCC) framework to solve the LSSC problem. It consists of sampling, FRC and clustering. RCC randomly samples a small number of data points, quickly calculates the codes of all data points by using the non-linear function learned from FRC, and employs a large-scale spectral clustering method to cluster the codes. Besides, we provide a theorem guarantee that the non-linear function has a first-order approximation ability and a group effect. The theorem manifests that the codes are easily used to construct a dividable similarity graph. Compared with the state-of-the-art LSSC methods, our model achieves better clustering results in large-scale datasets.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    33
    References
    14
    Citations
    NaN
    KQI
    []