A Framework for Deep Constrained Clustering

arXiv (Cornell University) (2021)

Hongjing Zhang Tianyang Zhan Sugato Basu Ian Davidson

Citation

Reference

Related Paper

Abstract:

The area of constrained clustering has been extensively explored by researchers and used by practitioners. Constrained clustering formulations exist for popular algorithms such as k-means, mixture models, and spectral clustering but have several limitations. A fundamental strength of deep learning is its flexibility, and here we explore a deep learning framework for constrained clustering and in particular explore how it can extend the field of constrained clustering. We show that our framework can not only handle standard together/apart constraints (without the well documented negative effects reported earlier) generated from labeled side information but more complex constraints generated from new types of side information such as continuous values and high-level domain knowledge. Furthermore, we propose an efficient training paradigm that is generally applicable to these four types of constraints. We validate the effectiveness of our approach by empirical results on both image and text datasets. We also study the robustness of our framework when learning with noisy constraints and show how different components of our framework contribute to the final performance. Our source code is available at $\href{https://github.com/blueocean92/deep_constrained_clustering}{\text{URL}}$.

Keywords:

Robustness

Constrained clustering

Code (set theory)

Spectral Clustering

Topics:

Advanced Image and Video Retrieval Techniques

Image Retrieval and Classification Techniques

Advanced Clustering Algorithms Research

10.48550/arxiv.2101.02792

Cite

PDF

A Semi-Supervised Spectral Clustering Algorithm Based on Rough Sets

Huiqing Wang Junjie Chen

Spectral clustering algorithm is an increasingly popular data clustering method, which derives from spectral graph theory. Spectral clustering builds the affinity matrix of the dataset, and solves eigenvalue decomposition of matrix to get the low dimensional embedding of data for later cluster. A semi-supervised spectral clustering algorithm makes use of the prior knowledge in the dataset, which improves the performance of clustering algorithms. In the paper, a semi-supervised spectral clustering algorithm based on rough sets is proposed, and extends rough set theory to the spectral clustering. The algorithm makes the clustering into a two-tier structure of upper and lower approximation, which can be used to settle the overlapping phenomenon existing in the dataset. Experiment proved that compared with existing algorithms, the modified algorithm obtains a better clustering performance.

Spectral Clustering

Data stream clustering

Single-linkage clustering

10.1109/cason.2010.84

Cite

Citations (0)

Co-spectral clustering based density peak

Yang Li Weifeng Liu Yanjiang Wang Dapeng Tao

Spectral clustering employs spectral-graph structure of a similarity matrix to partition data into disjoint meaningful groups, because of its well-defined mathematical framework, good performance and simplicity, spectral clustering has gained considerable attentions in the recent past. Despite these virtues, it suffers from several drawbacks, such as it is unable to determine a reasonable cluster number, sensitive to initial condition and not robust to outliers. In this paper, we present a new approach named density peak spectral clustering (DPSC) which combines spectral clustering with density peak clustering algorithm (DPCA) into a unified framework to solve these problems. Since multi-view data is common in clustering problem, to further bootstrap the clustering performance by using complementary information from different view, then we propose co-trained density peak spectral clustering (Co-DPSC), which is an extension of DPSC to multi-views based on the co-training idea. Experimental comparisons with a number of baselines on a toy and three real-world datasets show the effectiveness of our proposed DPSC and Co-DPSC algorithm.

Spectral Clustering

Single-linkage clustering

Disjoint sets

Data stream clustering

Clustering high-dimensional data

10.1109/icct.2015.7399974

Cite

Citations (8)

Spectral representation learning for one-step spectral rotation clustering

Neurocomputing (2020)

Guoqiu Wen Yonghua Zhu Wei Zheng

Spectral Clustering

Data stream clustering

Clustering high-dimensional data

Representation

k-medians clustering

10.1016/j.neucom.2019.09.108

Cite

Citations (8)

An improved multi-view spectral clustering based on tissue-like P systems

Scientific Reports (2022)

Huijian Chen Xiyu Liu

Multi-view spectral clustering is one of the multi-view clustering methods widely studied by numerous scholars. The first step of multi-view spectral clustering is to construct the similarity matrix of each view. Consequently, the clustering performance will be greatly affected by the quality of the similarity matrix of each view. To solve this problem well, an improved multi-view spectral clustering based on tissue-like P systems is proposed in this paper. The optimal per-view similarity matrix is generated in an iterative manner. In addition, spectral clustering is combined with the symmetric nonnegative matrix factorization method to directly output the clustering results to avoid the secondary operation, such as k-means or spectral rotation. Furthermore, improved multi-view spectral clustering is integrated with the tissue-like P system to enhance the computational efficiency of the multi-view clustering algorithm. Extensive experiments verify the effectiveness of this algorithm over other state-of-the-art algorithms.

Spectral Clustering

Similarity (geometry)

Biclustering

Matrix (chemical analysis)

Data stream clustering

10.1038/s41598-022-20358-6

Cite

Citations (1)

An anchor-based spectral clustering method

Frontiers of Information Technology & Electronic Engineering (2018)

Qin Zhang Guoqiang Zhong Junyu Dong

Spectral Clustering

Clustering high-dimensional data

Data stream clustering

Constrained clustering

Single-linkage clustering

10.1631/fitee.1700262

Cite

Citations (3)

One-step spectral rotation clustering with balanced constrains

World Wide Web (2021)

Guoqiu Wen Yonghua Zhu Linjun Chen Shichao Zhang

Spectral Clustering

Constrained clustering

Data stream clustering

Clustering high-dimensional data

10.1007/s11280-021-00958-4

Cite

Citations (5)

Robust Spectral Clustering via Matrix Aggregation

IEEE Access (2018)

Lei Du Yan Pan Xiaonan Luo

Spectral clustering has become one of the most popular clustering algorithms in recent years. In real-world clustering problems, the data points for clustering may have considerable noise. To the best of our knowledge, no single clustering algorithm is able to identify all different types of cluster structures. In the existing spectral clustering methods, little effort has been made to explicitly handle both the possibly considerable noise in data points and the robustness of clustering methods, which often degrades the clustering performance. In this paper, motivated by resampling and matrix aggregation, we propose a method for robust spectral clustering. In our method, we first construct multiple transition probability matrices, each is constructed by a subset of randomly selected features. Then, these matrices can be used to recover a shared low-rank similarity matrix, which is the input to the spectral clustering, and several sparse matrices, which represent the noise. The corresponding optimization problem has a low-rank constraint on the transition probability matrix. To solve the corresponding optimization problem, an optimization procedure based on the scheme of Augmented Lagrangian Method of Multipliers is designed. Experimental results on several real-world datasets show that our method has superior performance over several state-of-the-art clustering methods.

Spectral Clustering

Constrained clustering

Data stream clustering

Clustering high-dimensional data

Robustness

k-medians clustering

10.1109/access.2018.2871030

Cite

Citations (5)

Constraint Neighborhood Projections for Semi-Supervised Clustering

IEEE Transactions on Cybernetics (2014)

Hongjun Wang Tao Li Tianrui Li Yan Yang

Semi-supervised clustering aims to incorporate the known prior knowledge into the clustering algorithm. Pairwise constraints and constraint projections are two popular techniques in semi-supervised clustering. However, both of them only consider the given constraints and do not consider the neighbors around the data points constrained by the constraints. This paper presents a new technique by utilizing the constrained pairwise data points and their neighbors, denoted as constraint neighborhood projections that requires fewer labeled data points (constraints) and can naturally deal with constraint conflicts. It includes two steps: 1) the constraint neighbors are chosen according to the pairwise constraints and a given radius so that the pairwise constraint relationships can be extended to their neighbors, and 2) the original data points are projected into a new low-dimensional space learned from the pairwise constraints and their neighbors. A CNP-Kmeans algorithm is developed based on the constraint neighborhood projections. Extensive experiments on University of California Irvine (UCI) datasets demonstrate the effectiveness of the proposed method. Our study also shows that constraint neighborhood projections (CNP) has some favorable features compared with the previous techniques.

Constrained clustering

Data point

10.1109/tcyb.2013.2263383

Cite

Citations (30)

Spectral Clustering Algorithm Based on Density Representative Points

Weiyu Kong Guoyin Wang Jiang Xie Hao Bai

In order to solve the problems of long operation time and large memory consumption of the traditional spectral clustering algorithm applied to large-scale data sets, an improved spectral clustering algorithm based on density representative points is proposed. The algorithm changes the similarity matrix of spectral clustering. The construction method transforms the original spectral clustering from all sample points to construct a similarity matrix to use density representative points to construct a similarity matrix. In this way, the scale of the similarity matrix that needs to be calculated is greatly reduced, and the calculation of the spectral clustering algorithm is improved. In addition, the clustering effect of the spectral clustering algorithm is also improved. Simulation results show that the algorithm in this paper can effectively improve the processing ability of spectral clustering algorithm for data sets.

Spectral Clustering

Similarity (geometry)

Data stream clustering

Single-linkage clustering

Matrix (chemical analysis)

10.1109/icccbda56900.2023.10154775

Cite

Citations (0)