POCS-based Clustering Algorithm
6
Citation
8
Reference
10
Related Paper
Citation Trend
Abstract:
A novel clustering technique based on the projection onto convex set (POCS) method, called POCS-based clustering algorithm, is proposed in this paper. The proposed POCS-based clustering algorithm exploits a parallel projection method of POCS to find appropriate cluster prototypes in the feature space. The algorithm considers each data point as a convex set and projects the cluster prototypes parallelly to the member data points. The projections are convexly combined to minimize the objective function for data clustering purpose. The performance of the proposed POCS-based clustering algorithm is verified through experiments on various synthetic datasets. The experimental results show that the proposed POCS-based clustering algorithm is competitive and efficient in terms of clustering error and execution speed when compared with other conventional clustering methods including Fuzzy C-Means (FCM) and K-Means clustering algorithms.Keywords:
Data stream clustering
Single-linkage clustering
Clustering high-dimensional data
k-medians clustering
High-dimensional data clustering mechanisms are appearing, based on information clamorous and low-quality difficulties. Many of the current clustering algorithms become implicitly ineffective if the algorithms' essential similarity measure is calculated between data points in the high-dimensional space. To this end, various projected based clustering algorithms have been proposed. But, most of them faced problems when clusters cover in subspaces with very less dimensionality. To this end, the partition based Improved Clustering Large Applications (ICLARA) mechanism is employed. It is an expansion to approach to trade with data comprising many objects to reduce computing time and RAM storage problems. The proposed describes various representations and provides the most suitable clustering as the result to work with large datasets. The proposed approach is compared with previous hierarchal based CURE (Clustering Using REpresentatives), BIRCH (Balanced Iterative Reducing and Clustering using Hierarchies), and Partitional Distance-Based Projected Clustering (PDBPC) approaches.Also, we calculated the accuracy of all clustering techniques about their parameters configuration. The experimental results show the proposed Improved CLARA algorithm provides better accuracy compared with previous methods.
Data stream clustering
Clustering high-dimensional data
Single-linkage clustering
Consensus clustering
Constrained clustering
Cite
Citations (0)
Abstract In the big data era, clustering is one of the most popular data mining method. The majority of clustering algorithms have complications like automatic cluster number determination, poor clustering precision, inconsistent clustering of various datasets and parameter-dependent etc. A new fuzzy autonomous solution for clustering named Meskat-Mahmudul (MM) clustering algorithm proposed to overcome the complexity of parameter–free automatic cluster number determination and clustering accuracy. MM clustering algorithm finds out the exact number of clusters based on Average Silhouette method in multivariate mixed attribute dataset, including real-time gene expression dataset and dealt missing values, noise and outliers. MM Extended K-Means (MMK) clustering algorithm is an enhancement of the K-Means algorithm, which serves the purpose for automatic cluster discovery and runtime cluster placement. Several validation methods used to evaluate cluster and certify optimum cluster partitioning and perfection. Some datasets used to assess the performance of the proposed algorithms to other algorithms in terms of time complexity and clustering efficiency. Finally, MM clustering and MMK clustering algorithms found superior over conventional algorithms.
Data stream clustering
Single-linkage clustering
k-medians clustering
Clustering high-dimensional data
Cite
Citations (0)
We think of cluster analysis as class discovery. That is, we assume that there is an unknown mapping called clustering structure that assigns a class label to each observation, and the goal of cluster analysis is to estimate this clustering structure, that is, to estimate the number of clusters and cluster assignments. In traditional cluster analysis, it is assumed that such unknown mapping is unique. However, since the observations may cluster in more than one way depending on the variables used, it is natural to permit the existence of more than one clustering structure. This generalized clustering problem of estimating multiple clustering structures is the focus of this paper. We propose an algorithm for finding multiple clustering structures of observations which involves clustering both variables and observations. The number of clustering structures is determined by the number of variable clusters. The dissimilarity measure for clustering variables is based on nearest-neighbor graphs. The observations are clustered using weighted distances with weights determined by the clusters of the variables. The motivating application is to gene expression data.
Single-linkage clustering
k-medians clustering
Clustering high-dimensional data
Complete-linkage clustering
Constrained clustering
Cite
Citations (8)
A time sequence clustering algorithm based on edit distance is proposed in the paper, which solves the problem that the existing clustering algorithms for time sequence data is inefficient because of ignorance of different time span of time sequence data. Firstly, the algorithm calculates the distance between time sequences on which a distance matrix is determined. In the second place, for a given time sequence set, a forest with n binary trees is established in terms of the distance matrix and then merge the trees. Finally, a cluster clustering algorithm is called to dynamically adjust the clustering results, and then real-time clustering structure is obtained. Experimental results demonstrated that the algorithm has higher efficiency and clustering quality.
Data stream clustering
Single-linkage clustering
Sequence (biology)
k-medians clustering
Merge (version control)
Cite
Citations (1)
Data stream clustering
Single-linkage clustering
k-medians clustering
Clustering high-dimensional data
Cite
Citations (11)
To deal with large-scale data clustering problems,a speeding K-means parallel clustering method was presented which randomly sampled first and then used max-min distance means to carry out K-means parallel clustering.Sampling based method avoids the problem of clustering in local solutions and max-min distance based method makes the initial clustering centers tend to be optimum.Results of a large number of experiments show that the proposed method is affected less by the initial clustering center and improves the precision of clustering in both stand-alone environment and cluster environment.It also reduces the number of iterations of clustering and the clustering time.
Data stream clustering
Single-linkage clustering
k-medians clustering
Clustering high-dimensional data
Cite
Citations (0)
With the increasing size of data set,improving the efficiency of K-modes clustering algorithm or fuzzy K-modes clustering algorithm is becoming a critical issue.In order to improve the efficiency of the algorithm,a clustering method based on divided and conquered method was proposed.This method,not a one-time clustering of all data,divided the data set into several subsets,and each subset was clustered at the same time;the fusion results of each subset cluster form the final clustering results.The results show that the efficiency of clustering has been increased greatly compared with traditional clustering method in most cases.
Single-linkage clustering
Data stream clustering
Clustering high-dimensional data
Categorical variable
k-medians clustering
Cite
Citations (0)
In this talk, we study some clustering algorithms with automatic selection of cluster number. Our idea is to introduce a penalty term to the objective function (i) to make the clustering process not sensitive to the initial cluster centers and (ii) to discover cluster structure in a data set. Experimental results on synthetic and real data sets are presented to demonstrate the effectiveness of the proposed algorithm. We also develop the clustering algorithm for categorical data sets and high-dimensional data sets using subspace clustering techniques. Some interesting sub-clusters and subspace clusters in data sets are discovered and reported.
Single-linkage clustering
Clustering high-dimensional data
Categorical variable
Data stream clustering
k-medians clustering
Cite
Citations (1)
Due to the high dimensionality and sparseness of text data,the performance of traditional clustering algorithm may not be satisfied in clustering text data.The largest dense region having a small coverage rate with the partitioned clusters was selected out as initial cluster centroid set gradually by learning the similarity information between the partitioned and remainning sets.After generating the predetermined number of initial cluster centroid set,the remaining documents were assigned to their nearest clusters.By this way,the sensitivity of the clustering algorithm to the initial cluster centroid was reduced.Some threshold values used in this algorithm were calculated by the automatic statistic of the dataset dynamically in the process of clustering to avoid the blindness of the threshold parameters by experience or experiment in most clustering algorithms.The experiments on several Chinese and English datasets showed that this algorithm performes better than clustering algorithms in CLUTO.
Centroid
Single-linkage clustering
Clustering high-dimensional data
Statistic
k-medians clustering
Similarity (geometry)
Document Clustering
Cite
Citations (0)