logo
    Elite fuzzy clustering ensemble based on clustering diversity and quality measures
    57
    Citation
    85
    Reference
    10
    Related Paper
    Citation Trend
    Consensus clustering, also known as clustering ensembles is a technique that combines multiple clustering solutions to obtain stable, accurate and novel results. Over the last years several consensus clustering approaches were proposed addressing practical clustering problems with different degrees of success. In this paper, we consider data fragments as elements of a cluster ensemble framework. We propose a new dissimilarity measure on data fragments and build a consensus function that allows handling large scale clustering problems while not compromising on accuracy. We evaluate our proposed consensus function on a number of datasets showing its high performance with respect to other existing consensus functions.
    Consensus clustering
    Clustering high-dimensional data
    Single-linkage clustering
    Clustering is a technique adapted in many real world applications. Generally clustering can be thought of as partitioning the data into group or subsets, which contain analogous objects. A lot of clustering techniques like K-Means algorithm, Fuzzy C-Means algorithm (FCM), spectral clustering algorithm and so on has been proposed earlier in literature. Recently, clustering algorithms are extensively used for mixed data types to evaluate the performance of the clustering techniques. This paper presents a survey on various clustering algorithms that are proposed earlier in literature. Moreover it provides an insight into the advantages and limitations of some of those earlier proposed clustering techniques. The comparison of various clustering techniques is provided in this paper. The future enhancement section of this paper provides a general idea for improving the existing clustering algorithms to achieve better clustering accuracy.
    Data stream clustering
    Biclustering
    Single-linkage clustering
    Clustering high-dimensional data
    Citations (3)
    Clustering is a method of data analysis without the use of supervised data. Clustering method focusing on cluster size is expected to be useful for task distribution problems and several methods have been proposed. We proposed Fuzzy Even-sized Clustering Based on optimization (FECBO) and COntrolled-sized Clustering Based on Optimization (COCBO) as a method focusing on cluster size. However, these methods have the problem that they are susceptible to noise. It is believed that this issue can be overcome by applying noise clustering method. Noise clustering is a method that it classify noise into noise clusters. In this study, we extend FECBO and COCBO with noise clustering and verify its effectiveness through numerical examples.
    Clustering high-dimensional data
    Data stream clustering
    k-medians clustering
    Ensemble clustering consists in combining multiple clustering solutions into a single one, called the consensus, which can produce a more accurate and robust clustering of the data. In this paper, we attempt to implement ensemble clustering using Dempster-Shafer evidence theory. Individual clustering solutions are obtained using evidence theory and a novel diversity measure is proposed using the distance of evidence for selecting complementary individual solutions. After establishing the correspondence among different clustering solutions' labels, the consensus clustering solution can be obtained using evidence combination. Experimental results and related analyses show that our proposed approach can effectively implement the ensemble clustering.
    Consensus clustering
    Single-linkage clustering
    Affinity propagation (AP) is a widely used exemplar-based clustering approach with superior efficiency and clustering quality. Nevertheless, a common issue with AP clustering is the presence of excessive exemplars, which limits its ability to perform effective aggregation. This research aims to enable AP to automatically aggregate to produce fewer and more compact clusters, without changing the similarity matrix or customizing preference parameters, as done in existing enhanced approaches. An automatic aggregation enhanced affinity propagation (AAEAP) clustering algorithm is proposed, which combines a dependable partitioning clustering approach with AP to achieve this purpose. The partitioning clustering approach generates an additional set of findings with an equivalent number of clusters whenever the clustering stabilizes and the exemplars emerge. Based on these findings, mutually exclusive exemplar detection was conducted on the current AP exemplars, and a pair of unsuitable exemplars for coexistence is recommended. The recommendation is then mapped as a novel constraint, designated mutual exclusion and aggregation. To address this limitation, a modified AP clustering model is derived and the clustering is restarted, which can result in exemplar number reduction, exemplar selection adjustment, and other data point redistribution. The clustering is ultimately completed and a smaller number of clusters are obtained by repeatedly performing automatic detection and clustering until no mutually exclusive exemplars are detected. Some standard classification data sets are adopted for experiments on AAEAP and other clustering algorithms for comparison, and many internal and external clustering evaluation indexes are used to measure the clustering performance. The findings demonstrate that the AAEAP clustering algorithm demonstrates a substantial automatic aggregation impact while maintaining good clustering quality.
    Data stream clustering
    Constrained clustering
    Affinity propagation
    Single-linkage clustering
    Consensus clustering
    Clustering high-dimensional data
    Citations (1)
    Clustering techniques have gained great popularity in neuroscience data analysis especially in analysing data from complex experiment paradigm where it is hard to apply traditional model-based method. However, when employing clustering analysis, many clustering algorithms are available nowadays and even with an individual clustering algorithm, choices like parameter settings and distance metrics are very likely to have impacts on the final clustering results. In our previous work, we have demonstrated the benefits of integrating clustering results from multiple clustering algorithms, which provides more stable, reproducible, and complete clustering solutions. In this paper, we aim to further inspect the possible influences from the choices of distance metrics in clustering analysis.
    Consensus clustering
    Popularity
    Clustering high-dimensional data
    Data stream clustering
    Abstract Cluster analysis is used to categorize consumers into clusters that are homogeneous along a range of variables. In marketing, it is most often applied for purposes of market segmentation, product perceptual mapping, and data mining. We discuss two important clustering methods here: hierarchical clustering as a “bottom‐up” procedure, and nonhierarchical clustering as a “top‐down” procedure. Hierarchical clustering begins with each consumer in a cluster by itself. Then, with a (dis)similarity metric, subjects that are similar are taken into the same cluster. We provide an introduction to popular (dis)similarity metrics for continuous and discrete variables, as well as main hierarchical clustering algorithms. The result of hierarchical classification is a dendrogram : a tree structure that represents the hierarchical relations among all subjects being clustered. The nonhierarchical clustering methods, instead, partition the data into a predetermined number of segments and try to minimize some criterion of interest. K‐means clustering, partitioning around mediods (PAM), and fuzzy clustering are some of the most popular nonhierarchical algorithms. We also touch upon the issue regarding performance of different clustering algorithms, decision on the number of clusters, clustering validation, and software available for the clustering algorithms.
    Single-linkage clustering
    Hierarchical clustering
    Complete-linkage clustering
    Brown clustering
    Consensus clustering
    Constrained clustering
    Clustering is used to identify the intrinsic grouping of a set of unlabelled data. It can be applied in data mining exploration and statistical data analysis. The clustering technique plays an important role in the current digital environment. As the quality and complication of data on the internet are increasing in today’s rapidly evolving area, the clustering methods become the indispensable techniques to find the patterns of the data. There are many types of clustering techniques that have been developed included partitioning methods, hierarchical clustering, density-based clustering, model-based clustering, and fuzzy clustering. This study only focuses on three types of clustering techniques which are k-means clustering, agglomerative hierarchical clustering with the ward’s linkage, complete linkage, and average linkage, and Self-Organizing Map (SOM). The clustering algorithms are written using Python language by modifying the coding obtained from the Internet. In this project, experiments on visualisation and performance analysis of selected clustering methods are conducted. Besides that, a case study is conducted by implementing the clustering technique on online product reviews. The results for the experiment on visualisation of clustering methods, it showed that various clustering techniques have their visualisation for cluster analysis. Meanwhile, the results of the predictive accuracy indicated that k-means clustering and self-organizing map (SOM) are the most suitable techniques for cluster analysis. Based on the results of the case study, it concluded that the accuracy in clustering the online product reviews has the relationship with the structures and amount of the sentences. The extractive text summarisation with the clustering technique can be improved and further developed to imply in the customer review system as the correction between them have been known.
    Hierarchical clustering
    Consensus clustering
    Single-linkage clustering
    Data stream clustering
    Brown clustering
    Clustering high-dimensional data
    Citations (0)
    Consensus clustering and meta clustering are two important extensions of the classical clustering problem. Given a set of input clusterings of a given dataset, consensus clustering aims to find a single final clustering which is a better fit in some sense than the existing clusterings, and meta clustering aims to group similar input clusterings together so that users only need to examine a small number of different clusterings. In this paper, we present a new approach, MCC (stands for multiple consensus clustering), to explore multiple clustering views of a given dataset from the input clusterings by combining consensus clustering and meta clustering. In particular, given a set of input clusterings of a particular data set, MCC employs meta clustering to cluster the input clusterings and then uses consensus clustering to generate a consensus for each cluster of the input clusterings. Extensive experimental results on 11 real world data sets demonstrate the effectiveness of our proposed method.
    Consensus clustering
    Single-linkage clustering
    Clustering high-dimensional data
    Data stream clustering
    Constrained clustering
    Citations (4)
    Today clustering-based machine learning algorithms are the important field in data mining. Here, medical data clustering is one of the core applications of data mining to predict and identify the risk factor of the disease. At the same time, medical data clustering is a very important and challenging task due to its complexity and high frequency of data. In order to achieve proper data clustering, this paper proposed a hybrid data clustering algorithm by the combination of [Formula: see text]-Means and Black Hole Entropic Fuzzy Clustering (BHEFC). [Formula: see text]-Means is the first and one of the most popular and low-computation cost partitioned-based clustering algorithms. There are two modules in this hybrid clustering, first some number of iterations are executed by the first module of this hybrid clustering algorithm, which is [Formula: see text]-Means clustering. After some number of iterations, the clustering solutions are shifted to the second module of this hybrid clustering algorithm, which is Entrophic Fuzzy Clustering. So, it can get the advantages of both algorithms. [Formula: see text]-Means clustering algorithm can produce fast clustering solution due to its low-computation cost. But it can go for premature convergence. To overcome this problem, the second module used BHEFC, which can use large amount of high-frequency medical data. The experimental results are done with the medical practitioners to predict the risk factors of the heart disease patients and doctors can give the suggestions based on the risk factors. Finally, the efficiency of the proposed Hybrid [Formula: see text]-Means and BHEFC is analyzed by three different performance measures.
    Data stream clustering
    Clustering high-dimensional data
    Citations (1)