POCS-based Clustering Algorithm

Le-Anh Tran Henock M. Deberneh Truong-Dong Do Thanh-Dat Nguyen My-Ha Le Dong-Chul Park

Citation

Reference

Related Paper

Citation Trend

Abstract:

A novel clustering technique based on the projection onto convex set (POCS) method, called POCS-based clustering algorithm, is proposed in this paper. The proposed POCS-based clustering algorithm exploits a parallel projection method of POCS to find appropriate cluster prototypes in the feature space. The algorithm considers each data point as a convex set and projects the cluster prototypes parallelly to the member data points. The projections are convexly combined to minimize the objective function for data clustering purpose. The performance of the proposed POCS-based clustering algorithm is verified through experiments on various synthetic datasets. The experimental results show that the proposed POCS-based clustering algorithm is competitive and efficient in terms of clustering error and execution speed when compared with other conventional clustering methods including Fuzzy C-Means (FCM) and K-Means clustering algorithms.

Keywords:

Data stream clustering

Single-linkage clustering

Clustering high-dimensional data

k-medians clustering

Topics:

Advanced Clustering Algorithms Research

Face and Expression Recognition

Text and Document Classification Technologies

10.1109/iwis56333.2022.9920762

Cite

PDF

Improved CLARA-High-Dimensional Data Clustering Using Improved Clustering Large Applications

Design Engineering (2021)

B. Hari Babu Naveen Chandra T. V. Gopal

High-dimensional data clustering mechanisms are appearing, based on information clamorous and low-quality difficulties. Many of the current clustering algorithms become implicitly ineffective if the algorithms' essential similarity measure is calculated between data points in the high-dimensional space. To this end, various projected based clustering algorithms have been proposed. But, most of them faced problems when clusters cover in subspaces with very less dimensionality. To this end, the partition based Improved Clustering Large Applications (ICLARA) mechanism is employed. It is an expansion to approach to trade with data comprising many objects to reduce computing time and RAM storage problems. The proposed describes various representations and provides the most suitable clustering as the result to work with large datasets. The proposed approach is compared with previous hierarchal based CURE (Clustering Using REpresentatives), BIRCH (Balanced Iterative Reducing and Clustering using Hierarchies), and Partitional Distance-Based Projected Clustering (PDBPC) approaches.Also, we calculated the accuracy of all clustering techniques about their parameters configuration. The experimental results show the proposed Improved CLARA algorithm provides better accuracy compared with previous methods.

Data stream clustering

Clustering high-dimensional data

Single-linkage clustering

Consensus clustering

Constrained clustering

Source

Cite

Citations (0)

A Robust Fuzzy Approach For Gene Expression Data Clustering

Research Square (Research Square) (2021)

Meskat Jahan Mahmudul Hasan

Abstract In the big data era, clustering is one of the most popular data mining method. The majority of clustering algorithms have complications like automatic cluster number determination, poor clustering precision, inconsistent clustering of various datasets and parameter-dependent etc. A new fuzzy autonomous solution for clustering named Meskat-Mahmudul (MM) clustering algorithm proposed to overcome the complexity of parameter–free automatic cluster number determination and clustering accuracy. MM clustering algorithm finds out the exact number of clusters based on Average Silhouette method in multivariate mixed attribute dataset, including real-time gene expression dataset and dealt missing values, noise and outliers. MM Extended K-Means (MMK) clustering algorithm is an enhancement of the K-Means algorithm, which serves the purpose for automatic cluster discovery and runtime cluster placement. Several validation methods used to evaluate cluster and certify optimum cluster partitioning and perfection. Some datasets used to assess the performance of the proposed algorithms to other algorithms in terms of time complexity and clustering efficiency. Finally, MM clustering and MMK clustering algorithms found superior over conventional algorithms.

Data stream clustering

Single-linkage clustering

k-medians clustering

Clustering high-dimensional data

10.21203/rs.3.rs-547452/v1

Cite

Citations (0)

A Generalized Clustering Problem, with Application to DNA Microarrays

Statistical Applications in Genetics and Molecular Biology (2006)

Ilana Belitskaya‐Lévy

We think of cluster analysis as class discovery. That is, we assume that there is an unknown mapping called clustering structure that assigns a class label to each observation, and the goal of cluster analysis is to estimate this clustering structure, that is, to estimate the number of clusters and cluster assignments. In traditional cluster analysis, it is assumed that such unknown mapping is unique. However, since the observations may cluster in more than one way depending on the variables used, it is natural to permit the existence of more than one clustering structure. This generalized clustering problem of estimating multiple clustering structures is the focus of this paper. We propose an algorithm for finding multiple clustering structures of observations which involves clustering both variables and observations. The number of clustering structures is determined by the number of variable clusters. The dissimilarity measure for clustering variables is based on nearest-neighbor graphs. The observations are clustered using weighted distances with weights determined by the clusters of the variables. The motivating application is to gene expression data.

Single-linkage clustering

k-medians clustering

Clustering high-dimensional data

Complete-linkage clustering

Constrained clustering

10.2202/1544-6115.1197

Cite

Citations (8)

Time Sequence Clustering Based on Edit Distance

Applied Mechanics and Materials (2013)

Hai Zhou Shan Jing

A time sequence clustering algorithm based on edit distance is proposed in the paper, which solves the problem that the existing clustering algorithms for time sequence data is inefficient because of ignorance of different time span of time sequence data. Firstly, the algorithm calculates the distance between time sequences on which a distance matrix is determined. In the second place, for a given time sequence set, a forest with n binary trees is established in terms of the distance matrix and then merge the trees. Finally, a cluster clustering algorithm is called to dynamically adjust the clustering results, and then real-time clustering structure is obtained. Experimental results demonstrated that the algorithm has higher efficiency and clustering quality.

Data stream clustering

Single-linkage clustering

Sequence (biology)

k-medians clustering

Merge (version control)

10.4028/www.scientific.net/amm.401-403.1428

Cite

Citations (1)

A robust fuzzy approach for gene expression data clustering

Soft Computing (2021)

Meskat Jahan Mahmudul Hasan

Data stream clustering

Single-linkage clustering

k-medians clustering

Clustering high-dimensional data

10.1007/s00500-021-06397-7

Cite

Citations (11)

Improved accelerating large data K-means clustering algorithm

Jisuanji gongcheng yu sheji (2015)

Han Ya

To deal with large-scale data clustering problems,a speeding K-means parallel clustering method was presented which randomly sampled first and then used max-min distance means to carry out K-means parallel clustering.Sampling based method avoids the problem of clustering in local solutions and max-min distance based method makes the initial clustering centers tend to be optimum.Results of a large number of experiments show that the proposed method is affected less by the initial clustering center and improves the precision of clustering in both stand-alone environment and cluster environment.It also reduces the number of iterations of clustering and the clustering time.

Data stream clustering

Single-linkage clustering

k-medians clustering

Clustering high-dimensional data

Source

Cite

Citations (0)

An Effective High Dimensional Categorical Data Clustering Method Research

Microelectronics & Computer (2011)

Deyu Li

With the increasing size of data set,improving the efficiency of K-modes clustering algorithm or fuzzy K-modes clustering algorithm is becoming a critical issue.In order to improve the efficiency of the algorithm,a clustering method based on divided and conquered method was proposed.This method,not a one-time clustering of all data,divided the data set into several subsets,and each subset was clustered at the same time;the fusion results of each subset cluster form the final clustering results.The results show that the efficiency of clustering has been increased greatly compared with traditional clustering method in most cases.

Single-linkage clustering

Data stream clustering

Clustering high-dimensional data

Categorical variable

k-medians clustering

Source

Cite

Citations (0)

Clustering algorithms with automatic selection of cluster number

Michael K. Ng

In this talk, we study some clustering algorithms with automatic selection of cluster number. Our idea is to introduce a penalty term to the objective function (i) to make the clustering process not sensitive to the initial cluster centers and (ii) to discover cluster structure in a data set. Experimental results on synthetic and real data sets are presented to demonstrate the effectiveness of the proposed algorithm. We also develop the clustering algorithm for categorical data sets and high-dimensional data sets using subspace clustering techniques. Some interesting sub-clusters and subspace clusters in data sets are discovered and reported.

Single-linkage clustering

Clustering high-dimensional data

Categorical variable

Data stream clustering

k-medians clustering

10.1109/grc.2008.4664805

Cite

Citations (1)

Research of Adaptive Text Clustering Based on the Statistics of the Datasets

Journal of Sichuan University (2012)

Donghui Chen

Due to the high dimensionality and sparseness of text data,the performance of traditional clustering algorithm may not be satisfied in clustering text data.The largest dense region having a small coverage rate with the partitioned clusters was selected out as initial cluster centroid set gradually by learning the similarity information between the partitioned and remainning sets.After generating the predetermined number of initial cluster centroid set,the remaining documents were assigned to their nearest clusters.By this way,the sensitivity of the clustering algorithm to the initial cluster centroid was reduced.Some threshold values used in this algorithm were calculated by the automatic statistic of the dataset dynamically in the process of clustering to avoid the blindness of the threshold parameters by experience or experiment in most clustering algorithms.The experiments on several Chinese and English datasets showed that this algorithm performes better than clustering algorithms in CLUTO.

Centroid

Single-linkage clustering

Clustering high-dimensional data

Statistic

k-medians clustering

Similarity (geometry)

Document Clustering

Source

Cite

Citations (0)