logo
    A Bi-directional Fuzzy C-Means Clustering Ensemble Algorithm Considering Local Information
    4
    Citation
    41
    Reference
    10
    Related Paper
    Citation Trend
    Abstract:
    Abstract The classic Fuzzy C-means (FCM) algorithm has limited clustering performance and is prone to misclassification of border points. This study offers a bi-directional FCM clustering ensemble approach that takes local information into account (LI_BIFCM) to overcome these challenges and increase clustering quality. First, various membership matrices are created after running FCM multiple times, based on the randomization of the initial cluster centers, and a vertical ensemble is performed using the maximum membership principle. Second, after each execution of FCM, multiple local membership matrices of the sample points are created using multiple K-nearest neighbors, and a horizontal ensemble is performed. Multiple horizontal ensembles can be created using multiple FCM clustering. Finally, the final clustering results are obtained by combining the vertical and horizontal clustering ensembles. Twelve data sets were chosen for testing from both synthetic and real data sources. The LI_BIFCM clustering performance outperformed four traditional clustering algorithms and three clustering ensemble algorithms in the experiments. Furthermore, the final clustering results has a weak correlation with the bi-directional cluster ensemble parameters, indicating that the suggested technique is robust.
    Keywords:
    Single-linkage clustering
    k-medians clustering
    Data stream clustering
    On the basis of analyzing k-means clustering algorithm and k-medians clustering algorithm, cluster analysis is made on a set of data objects by using tschebyshev distance (i.e. -norm) to have got a novel result that the cluster center is just the average of the maximum and minimum values of the data objects. Furthermore, a new clustering algorithm-k-maxmins clustering algorithm is presented. Finally, computing results of k-maxmins clustering algorithm, k-means clustering algorithm and k-medians clustering algorithm are given.
    k-medians clustering
    Single-linkage clustering
    Data stream clustering
    Citations (1)
    Data stream clustering
    Single-linkage clustering
    k-medians clustering
    Clustering high-dimensional data
    Citations (11)
    In this paper, a Two-Phase Clustering (TPC) for the data sets with complex distribution is proposed. TPC contains two phases. First, the data set is partitioned into some sub-clusters with spherical distribution, and each clustering center represents all the members of its corresponding cluster. Then, by utilizing the outstanding clustering performance of the Manifold Evolutionary Clustering (MEC) for acomplex distributed data, the clustering centers obtained in the first phase are clustered. Finally, based on these two clustering results, the final results are obtained. This algorithm is based on an improved K-means, and the MEC. Manifold distance is introduced in evolutionary clustering to make the algorithm competent for the clustering of complex data sets. At the same time, the novel method reduces the computational cost brought by manifold distance. Experimental results on seven artificial data sets and seven UCI data sets with different structure show that the novel algorithm has the ability to identify clusters with simple or complex, convex, or non-convex distribution efficiently, compared with the genetic algorithm-based clustering, the K-means algorithm, and the manifold evolutionary clustering. Furthermore, TPC outperforms MEC obviously in terms of computational time.
    Single-linkage clustering
    Data stream clustering
    k-medians clustering
    Clustering high-dimensional data
    Constrained clustering
    Citations (6)
    In data mining, Clustering is the most popular, powerful and commonly used unsupervised learning technique. It is a way of locating similar data objects into clusters based on some similarity. Clustering algorithms can be categorized into seven groups, namely Hierarchical clustering algorithm, Density-based clustering algorithm, Partitioning clustering algorithm, Graph-based algorithm, Grid-based algorithm, Model-based clustering algorithm and Combinational clustering algorithm. These clustering algorithms give different result according to the conditions. Some clustering techniques are better for large data set and some gives good result for finding cluster with arbitrary shapes. This paper is planned to learn and relates various data mining clustering algorithms. Algorithms which are under exploration as follows: K-Means algorithm, K-Medoids, Distributed K-Means clustering algorithm, Hierarchical clustering algorithm, Grid-based Algorithm and Density based clustering algorithm. This paper compared all these clustering algorithms according to the many factors. After comparison of these clustering algorithms I describe that which clustering algorithms should be used in different conditions for getting the best result.
    Data stream clustering
    Single-linkage clustering
    Hierarchical clustering
    Citations (59)
    To deal with large-scale data clustering problems,a speeding K-means parallel clustering method was presented which randomly sampled first and then used max-min distance means to carry out K-means parallel clustering.Sampling based method avoids the problem of clustering in local solutions and max-min distance based method makes the initial clustering centers tend to be optimum.Results of a large number of experiments show that the proposed method is affected less by the initial clustering center and improves the precision of clustering in both stand-alone environment and cluster environment.It also reduces the number of iterations of clustering and the clustering time.
    Data stream clustering
    Single-linkage clustering
    k-medians clustering
    Clustering high-dimensional data
    Citations (0)
    Correlation clustering problem is a NP hard problem and technologies for the solving of correlation clustering problem can be used to cluster given data set with relation matrix for data in the given data set. In this paper, an approach based on genetic algorithm for correlation clustering problem, named as GeneticCC, is presented. To estimate the performance of a clustering division, data correlation based clustering precision is defined and features of clustering precision are discussed in this paper. Experimental results show that the performance of clustering division for UCI document data set constructed by GeneticCC is better than clustering performance of other clustering divisions constructed by SOM neural network with clustering precision as criterion.
    Single-linkage clustering
    Data stream clustering
    Clustering high-dimensional data
    k-medians clustering
    Citations (11)
    Clustering is a data mining technique used to place data elements into related groups without advance knowledge of the group definition. Clustering is a pro-cess of partitioning a set of data in a set of meaningful sub-classes, called cluster. In this paper, we propose to give a review of the most used clustering methods. First, we give an introduction about clustering methods, how they work and their main challenges. Second, we present the clustering methods with some comparisons including mainly the classical partitioning clustering methods like well-known k-means algorithms, Gaussian Mixture Modals and their variants, the classical hierarchical clustering methods. Clustering algorithms can be categorized into partition-based algorithms, hierarchical-based algorithms, density-based algorithms and grid-based algorithms. Partitioning clustering algorithm splits the data points into k partition, where each partition represents a cluster. Hierarchical clustering is a technique of clustering which divide the similar dataset by constructing a hierarchy of clusters. Density based algorithms and the cluster according to the regions which grow with high density. It is the one-scan algorithms. Grid Density based algorithm uses the multi resolution grid data structure and use dense grids to form clusters. Its main distinctiveness is the fastest processing time. In this survey paper, an analysis of clustering and its different techniques in data mining is done.
    Single-linkage clustering
    Data stream clustering
    Hierarchical clustering
    Complete-linkage clustering
    Consensus clustering
    Constrained clustering
    Citations (0)
    Clustering is the bunching of the data into groups of identical objects. Here each bunch is known as a cluster, each object is identical to its objects of the same cluster and different from other clusters. In this paper, we are doing an experimental study for comparing clustering algorithms using multiple-objective functions. We have investigated K-means a Partitioning-based clustering, Hierarchical clustering, Spectral clustering, Gaussian Mixture Model Clustering, and Clustering using Hidden Markov Model. The performance of these methods was compared using multiple objective functions. Multiple objectives have two core objectives: Cluster Homogeneity and separation. These multiple objective functions will be a great help to discover robust clusters in a more efficient way.
    Single-linkage clustering
    k-medians clustering
    Complete-linkage clustering
    Hierarchical clustering
    Data stream clustering
    Constrained clustering
    Citations (0)