A Framework for Deep Constrained Clustering
0
Citation
34
Reference
10
Related Paper
Abstract:
The area of constrained clustering has been extensively explored by researchers and used by practitioners. Constrained clustering formulations exist for popular algorithms such as k-means, mixture models, and spectral clustering but have several limitations. A fundamental strength of deep learning is its flexibility, and here we explore a deep learning framework for constrained clustering and in particular explore how it can extend the field of constrained clustering. We show that our framework can not only handle standard together/apart constraints (without the well documented negative effects reported earlier) generated from labeled side information but more complex constraints generated from new types of side information such as continuous values and high-level domain knowledge. Furthermore, we propose an efficient training paradigm that is generally applicable to these four types of constraints. We validate the effectiveness of our approach by empirical results on both image and text datasets. We also study the robustness of our framework when learning with noisy constraints and show how different components of our framework contribute to the final performance. Our source code is available at $\href{https://github.com/blueocean92/deep_constrained_clustering}{\text{URL}}$.Keywords:
Robustness
Constrained clustering
Code (set theory)
Spectral Clustering
Spectral clustering algorithm is an increasingly popular data clustering method, which derives from spectral graph theory. Spectral clustering builds the affinity matrix of the dataset, and solves eigenvalue decomposition of matrix to get the low dimensional embedding of data for later cluster. A semi-supervised spectral clustering algorithm makes use of the prior knowledge in the dataset, which improves the performance of clustering algorithms. In the paper, a semi-supervised spectral clustering algorithm based on rough sets is proposed, and extends rough set theory to the spectral clustering. The algorithm makes the clustering into a two-tier structure of upper and lower approximation, which can be used to settle the overlapping phenomenon existing in the dataset. Experiment proved that compared with existing algorithms, the modified algorithm obtains a better clustering performance.
Spectral Clustering
Data stream clustering
Single-linkage clustering
Cite
Citations (0)
Spectral clustering employs spectral-graph structure of a similarity matrix to partition data into disjoint meaningful groups, because of its well-defined mathematical framework, good performance and simplicity, spectral clustering has gained considerable attentions in the recent past. Despite these virtues, it suffers from several drawbacks, such as it is unable to determine a reasonable cluster number, sensitive to initial condition and not robust to outliers. In this paper, we present a new approach named density peak spectral clustering (DPSC) which combines spectral clustering with density peak clustering algorithm (DPCA) into a unified framework to solve these problems. Since multi-view data is common in clustering problem, to further bootstrap the clustering performance by using complementary information from different view, then we propose co-trained density peak spectral clustering (Co-DPSC), which is an extension of DPSC to multi-views based on the co-training idea. Experimental comparisons with a number of baselines on a toy and three real-world datasets show the effectiveness of our proposed DPSC and Co-DPSC algorithm.
Spectral Clustering
Single-linkage clustering
Disjoint sets
Data stream clustering
Clustering high-dimensional data
Cite
Citations (8)
Spectral Clustering
Data stream clustering
Clustering high-dimensional data
Representation
k-medians clustering
Cite
Citations (8)
Multi-view spectral clustering is one of the multi-view clustering methods widely studied by numerous scholars. The first step of multi-view spectral clustering is to construct the similarity matrix of each view. Consequently, the clustering performance will be greatly affected by the quality of the similarity matrix of each view. To solve this problem well, an improved multi-view spectral clustering based on tissue-like P systems is proposed in this paper. The optimal per-view similarity matrix is generated in an iterative manner. In addition, spectral clustering is combined with the symmetric nonnegative matrix factorization method to directly output the clustering results to avoid the secondary operation, such as k-means or spectral rotation. Furthermore, improved multi-view spectral clustering is integrated with the tissue-like P system to enhance the computational efficiency of the multi-view clustering algorithm. Extensive experiments verify the effectiveness of this algorithm over other state-of-the-art algorithms.
Spectral Clustering
Similarity (geometry)
Biclustering
Matrix (chemical analysis)
Data stream clustering
Cite
Citations (1)
Spectral Clustering
Clustering high-dimensional data
Data stream clustering
Constrained clustering
Single-linkage clustering
Cite
Citations (3)
Spectral Clustering
Constrained clustering
Data stream clustering
Clustering high-dimensional data
Cite
Citations (5)
Spectral clustering has become one of the most popular clustering algorithms in recent years. In real-world clustering problems, the data points for clustering may have considerable noise. To the best of our knowledge, no single clustering algorithm is able to identify all different types of cluster structures. In the existing spectral clustering methods, little effort has been made to explicitly handle both the possibly considerable noise in data points and the robustness of clustering methods, which often degrades the clustering performance. In this paper, motivated by resampling and matrix aggregation, we propose a method for robust spectral clustering. In our method, we first construct multiple transition probability matrices, each is constructed by a subset of randomly selected features. Then, these matrices can be used to recover a shared low-rank similarity matrix, which is the input to the spectral clustering, and several sparse matrices, which represent the noise. The corresponding optimization problem has a low-rank constraint on the transition probability matrix. To solve the corresponding optimization problem, an optimization procedure based on the scheme of Augmented Lagrangian Method of Multipliers is designed. Experimental results on several real-world datasets show that our method has superior performance over several state-of-the-art clustering methods.
Spectral Clustering
Constrained clustering
Data stream clustering
Clustering high-dimensional data
Robustness
k-medians clustering
Cite
Citations (5)
Semi-supervised clustering aims to incorporate the known prior knowledge into the clustering algorithm. Pairwise constraints and constraint projections are two popular techniques in semi-supervised clustering. However, both of them only consider the given constraints and do not consider the neighbors around the data points constrained by the constraints. This paper presents a new technique by utilizing the constrained pairwise data points and their neighbors, denoted as constraint neighborhood projections that requires fewer labeled data points (constraints) and can naturally deal with constraint conflicts. It includes two steps: 1) the constraint neighbors are chosen according to the pairwise constraints and a given radius so that the pairwise constraint relationships can be extended to their neighbors, and 2) the original data points are projected into a new low-dimensional space learned from the pairwise constraints and their neighbors. A CNP-Kmeans algorithm is developed based on the constraint neighborhood projections. Extensive experiments on University of California Irvine (UCI) datasets demonstrate the effectiveness of the proposed method. Our study also shows that constraint neighborhood projections (CNP) has some favorable features compared with the previous techniques.
Constrained clustering
Data point
Cite
Citations (30)
In order to solve the problems of long operation time and large memory consumption of the traditional spectral clustering algorithm applied to large-scale data sets, an improved spectral clustering algorithm based on density representative points is proposed. The algorithm changes the similarity matrix of spectral clustering. The construction method transforms the original spectral clustering from all sample points to construct a similarity matrix to use density representative points to construct a similarity matrix. In this way, the scale of the similarity matrix that needs to be calculated is greatly reduced, and the calculation of the spectral clustering algorithm is improved. In addition, the clustering effect of the spectral clustering algorithm is also improved. Simulation results show that the algorithm in this paper can effectively improve the processing ability of spectral clustering algorithm for data sets.
Spectral Clustering
Similarity (geometry)
Data stream clustering
Single-linkage clustering
Matrix (chemical analysis)
Cite
Citations (0)