Multiple Partitions Aligned Clustering
49
Citation
28
Reference
10
Related Paper
Citation Trend
Abstract:
Multi-view clustering is an important yet challenging task due to the difficulty of integrating the information from multiple representations. Most existing multi-view clustering methods explore the heterogeneous information in the space where the data points lie. Such common practice may cause significant information loss because of unavoidable noise or inconsistency among views. Since different views admit the same cluster structure, the natural space should be all partitions. Orthogonal to existing techniques, in this paper, we propose to leverage the multi-view information by fusing partitions. Specifically, we align each partition to form a consensus cluster indicator matrix through a distinct rotation matrix. Moreover, a weight is assigned for each view to account for the clustering capacity differences of views. Finally, the basic partitions, weights, and consensus clustering are jointly learned in a unified framework. We demonstrate the effectiveness of our approach on several real datasets, where significant improvement is found over other state-of-the-art multi-view clustering methods.Keywords:
Leverage (statistics)
Consensus clustering
Constrained clustering
The clustering ensemble aims to combine multiple clustering results into a probably better and more robust consensus clustering. This technique has shown its efficiency in finding bizarre clusters, dealing with noise, and integrating clustering solutions from multiple distributed sources. Consensus clustering methods based on voting mechanism are widely used in literature. The idea behind majority voting is that the judgement of a group is superior to those of individuals. However, Voting-based consensus methods suffer from the problem of assigning the appropriate cluster label to data objects without majority vote. To deal with this ambiguity as well as clustering when datasets are too large or when new information can arrive dynamically at any time, we have proposed a new clustering approach based on two stage clustering technique where in the first stage a clustering ensemble method based on relabeling and voting process is used to cluster the data objects. Therefore, a new set of disjoint sub-clusters is generated based on majority vote, where each data object vote for the cluster in which it belongs and for its corresponding cluster in each other clustering results. data objects without majority vote are collected in new dataset. In the second stage, the new database as well as the set of previously obtained sub-clusters are processed using an incremental clustering algorithm. The used incremental clustering algorithm is initialized using the obtained sub-clusters and operate on the new dataset elements. The main advantage of incremental clustering methods is that the system can updates its assumptions based on recently available learning data without re-examining old data. The proposed approach have been evaluated using different datasets, where the experimental results have demonstrated the effectiveness and robustness of the proposed method.
Consensus clustering
Data stream clustering
Constrained clustering
Single-linkage clustering
k-medians clustering
Clustering high-dimensional data
Cite
Citations (1)
Single-linkage clustering
Hierarchical clustering
Constrained clustering
Consensus clustering
Data stream clustering
Brown clustering
Cite
Citations (0)
Partitioning a set of objects into homogeneous clusters is a fundamental operation in data mining. The operation is needed in a number of data mining tasks. Clustering or data grouping is the key technique of the data mining. It is an unsupervised learning task where one seeks to identify a finite set of categories termed clusters to describe the data . The grouping of data into clusters is based on the principle of maximizing the intra class similarity and minimizing the inter class similarity. The goal of clustering is to determine the intrinsic grouping in a set of unlabeled data. But how to decide what constitutes a good clustering? This paper deal with the study of various clustering algorithms of data mining and it focus on the clustering basics, requirement, classification, problem and application area of the clustering algorithms.
Constrained clustering
Data stream clustering
Single-linkage clustering
Consensus clustering
Clustering high-dimensional data
Similarity (geometry)
Data set
Cite
Citations (68)
High-dimensional data is explained by a huge quantity of features, introduces new issues to clustering. The so-named 'high dimensionality', creates initially to explain the common increase in time complexity of several computational issues, so the performances of the general clustering algorithms are unsuccessful. Accordingly, several works have been focused on introducing new techniques and clustering algorithms for handling higher dimensionality data. Regular to all clustering algorithms is the fact with the purpose of they need a various fundamental evaluation of similarity among data objects. However still, the existing clustering algorithms have some open research issues. In this review work, we provide a summary of the result of high-dimensional data space and their implications for various clustering algorithms. It also presents a detailed overview of many clustering algorithms with several types: subspace methods, modelbased clustering, density-based clustering methods; partition based clustering methods, etc., including a more detailed description of recent work of their own advantages and disadvantages for solving higher dimensionality data problem. The scope of the future work to extend the present clustering methods and algorithms are also discussed at end of the work.
Data stream clustering
Clustering high-dimensional data
Constrained clustering
Consensus clustering
Cite
Citations (0)
It is well known that clustering is an unsupervised machine learning technique. However, most of the clustering methods need setting several parameters such as number of clusters, shape of clusters, or other user- or problem-specific parameters and thresholds. In this paper, we propose a new clustering approach which is fully autonomous, in the sense that it does not require parameters to be pre-defined. This approach is based on data density automatically derived from their mutual distribution in the data space. It is called ADD clustering (Autonomous Data Density based clustering). It is entirely based on the experimentally observable data and is free from restrictive prior assumptions. This new method exhibits highly accurate clustering performance. Its performance is compared on benchmarked data sets with other competitive alternative approaches. Experimental results demonstrate that ADD clustering significantly outperforms other clustering methods yet does not require restrictive user- or problem-specific parameters or assumptions. The new clustering method is a solid basis for further applications in the field of data analytics.
Data stream clustering
Constrained clustering
Clustering high-dimensional data
Consensus clustering
Cite
Citations (10)
Though subspace clustering, ensemble clustering, alternative clustering, and multiview clustering are different approaches motivated by different problems and aiming at different goals, there are similar problems in these fields. Here we shortly survey these areas from the point of view of subspace clustering. Based on this survey, we try to identify problems where the different research areas could probably learn from each other.
Consensus clustering
Single-linkage clustering
Data stream clustering
Clustering high-dimensional data
Constrained clustering
k-medians clustering
Brown clustering
Cite
Citations (17)
Constrained clustering
Consensus clustering
Single-linkage clustering
Clustering high-dimensional data
Data stream clustering
Complete-linkage clustering
Cite
Citations (14)
Single-linkage clustering
Consensus clustering
Similarity (geometry)
Constrained clustering
Robustness
Data stream clustering
Clustering high-dimensional data
Complete-linkage clustering
Cite
Citations (8)
In the current world, there is a need to analyze and extract information from data. Clustering is one such analytical method which involves the distribution of data into groups of identical objects. Every group is known as a cluster, which consists of objects that have affinity within the cluster and disparity with the objects in other groups. This paper is intended to examine and evaluate various data clustering algorithms. The two major categories of clustering approaches are partition and hierarchical clustering. The algorithms which are dealt here are: k-means clustering algorithm, hierarchical clustering algorithm, density based clustering algorithm, self-organizing map algorithm, and expectation maximization clustering algorithm. All the mentioned algorithms are explained and analyzed based on the factors like the size of the dataset, type of the data set, number of clusters created, quality, accuracy and performance. This paper also provides the information about the tools which are used to implement the clustering approaches. The purpose of discussing the various software/tools is to make the beginners and new researchers to understand the working, which will help them to come up with new product and approaches for the improvement.
Single-linkage clustering
Hierarchical clustering
Data stream clustering
Constrained clustering
Consensus clustering
Cite
Citations (32)