An Effective High Dimensional Categorical Data Clustering Method Research

Microelectronics & Computer (2011)

Citation

Reference

Related Paper

Abstract:

With the increasing size of data set,improving the efficiency of K-modes clustering algorithm or fuzzy K-modes clustering algorithm is becoming a critical issue.In order to improve the efficiency of the algorithm,a clustering method based on divided and conquered method was proposed.This method,not a one-time clustering of all data,divided the data set into several subsets,and each subset was clustered at the same time;the fusion results of each subset cluster form the final clustering results.The results show that the efficiency of clustering has been increased greatly compared with traditional clustering method in most cases.

Keywords:

Single-linkage clustering

Data stream clustering

Clustering high-dimensional data

Categorical variable

k-medians clustering

Topics:

Wireless Sensor Networks and IoT

Source

Cite

Improved hierarchical K-means clustering algorithm

Computer Engineering and Applications Journal (2013)

Hu Wei

This paper presents an improved hierarchical K-means clustering algorithm combining hierarchical structure of space,in order to solve the problem that bad result of traditional K-means clustering method by selecting the number of categories randomly before clustering.By primary K-means clustering,it determines whether re-clustering in the more fine level by the result of initial clustering.By repeated execution,a hierarchical K-means clustering tree is produced,and the number of clusters is selected automatically on this tree structure.Simulation results on UCI datasets demonstrate that comparing with traditional K-means clustering means,the better clustering results are obtained by the hierarchical K-means clustering model.

Single-linkage clustering

Hierarchical clustering

Brown clustering

Data stream clustering

Source

Cite

Citations (13)

A Fuzzy Approach for Text Mining

International Journal of Mathematical Sciences and Computing (2015)

Deepa B. Patil Yashwant Dongre

Document clustering is an integral and important part of text mining.There are two types of clustering, namely, hard clustering and soft clustering.In case of hard clustering, data item belongs to only one cluster whereas in soft clustering, data point may fall into more than one cluster.Thus, soft clustering leads to fuzzy clustering wherein each data point is associated with a membership function that expresses the degree to which individual data points belong to the cluster.Accuracy is desired in information retrieval, which can be achieved by fuzzy clustering.In the work presented here, a fuzzy approach for text classification is used to classify the documents into appropriate clusters using Fuzzy C Means (FCM) clustering algorithm.Enron email dataset is used for experimental purpose.Using FCM clustering algorithm, emails are classified into different clusters.The results obtained are compared with the output produced by k means clustering algorithm.The comparative study showed that the fuzzy clusters are more appropriate than hard clusters.

Single-linkage clustering

FLAME clustering

Clustering high-dimensional data

Document Clustering

k-medians clustering

10.5815/ijmsc.2015.04.04

Cite

Citations (15)

Document clustering by fuzzy c-mean algorithm

T. Win Lin Mon

Clustering documents enable the user to have a good overall view of the information contained in the documents. Most classical clustering algorithms assign each data to exactly one cluster, thus forming a crisp partition of the given data, but fuzzy clustering allows for degrees of membership, to which a data belongs to different clusters. In this system, documents are clustered by using fuzzy c-means (FCM) clustering algorithm. FCM clustering is one of well-know unsupervised clustering techniques. However FCM algorithm requires the user to pre-define the number of clusters and different values of clusters corresponds to different fuzzy partitions. So the validation of clustering result is needed. PBM index and F-measure are used for cluster validity.

Single-linkage clustering

FLAME clustering

Complete-linkage clustering

k-medians clustering

Consensus clustering

10.1109/icacc.2010.5487022

Cite

Citations (10)

A Novel Clustering Validity Function of FCM Clustering Algorithm

IEEE Access (2019)

Linnan Zhu Jie-Sheng Wang Hongyu Wang

Cluster analysis refers to the process of grouping a collection of physical or abstract objects into multiple classes of similar objects. Determining the optimal classification number of a data set is the key to the clustering problem, that is to say whether the data set can be effectively partitioned. Cluster validity study is a process of establishing clustering effectiveness indicators, evaluating clustering quality and determining the optimal number of clusters. A validity function of fuzzy C-means (FCM) clustering algorithm is proposed by adopting the division of intra-class compactness and inter-class separation, whose minimum represents the best clustering. Then, the proposed validity function on FCM clustering algorithm is compared with the known typical validity functions by carrying out simulation experiments to compare the related clustering performance. Three data sets are adopted to carry out FCM clustering, which includes three classical data sets, two artificial data sets and six real data sets in UCI database. Simulation experimental results show that the proposed validity function can effectively partition the data set.

Single-linkage clustering

Data stream clustering

Constrained clustering

k-medians clustering

10.1109/access.2019.2946599

Cite

Citations (35)

Text Mining Algorithm Based on Fuzzy Clustering

Jisuanji gongcheng (2009)

Zhiyong Liu Xinqing Geng

The main defect of traditional methods of FCM algorithm is sensitive to the isolated data and is to know the number of clustering in advance.A fuzzy clustering algorithm NSFCM is presented in this paper,and NSFCM agorithm is applied to text mining.This algorithm adds a weight to the membership of the data,which is to decrease the effect on the initial cluster center.This paper applies average information entropy to find the number of clusters and adopts a density function algorithm to find the initial cluster centers.The experiment shows both the precision and the efficiency of clustering NSFCM are higher than those of FCM.

Single-linkage clustering

k-medians clustering

Source

Cite

Citations (7)

A Comparative Study of clustering algorithms Using weka tools

Bharat S. Chaudhari Manan Parikh

Data clustering is a process of putting similar data into groups. A clustering algorithm partitions a data set into several groups based on the principle of maximizing the intra-class similarity and minimizing the inter-class similarity. This paper analyze the three major clustering algorithms: K-Means, Hierarchical clustering and Density based clustering algorithm and compare the performance of these three major clustering algorithms on the aspect of correctly class wise cluster building ability of algorithm. Performance of the 3 techniques are presented and compared using a clustering tool WEKA.

Single-linkage clustering

Data stream clustering

Hierarchical clustering

Similarity (geometry)

Consensus clustering

Source

Cite

Citations (42)

Correlation clustering based on genetic algorithm for documents clustering

Zhenya Zhang Hongmei Cheng Wanli Chen Shuguang Zhang Qiansheng Fang

Correlation clustering problem is a NP hard problem and technologies for the solving of correlation clustering problem can be used to cluster given data set with relation matrix for data in the given data set. In this paper, an approach based on genetic algorithm for correlation clustering problem, named as GeneticCC, is presented. To estimate the performance of a clustering division, data correlation based clustering precision is defined and features of clustering precision are discussed in this paper. Experimental results show that the performance of clustering division for UCI document data set constructed by GeneticCC is better than clustering performance of other clustering divisions constructed by SOM neural network with clustering precision as criterion.

Single-linkage clustering

Data stream clustering

Clustering high-dimensional data

k-medians clustering

10.1109/cec.2008.4631230

Cite

Citations (11)

Large data sets clustering analysis based on distribution

Computer Engineering and Applications Journal (2008)

Riquan Zhang

In order to improve the efficiency we propose a distributed clustering algorithm based on large data sets.Namely data is randomly divided into several subsets without clustering all the data at a time,then we cluster all the subsets at the same time.At last we combine the genus.Experiment results show that most of time the result is the same as using traditional clustering algorithm,and it improves the clustering speed greatly.

Single-linkage clustering

Data stream clustering

k-medians clustering

Clustering high-dimensional data

Source

Cite

Citations (0)

Procedure of Partitioning Data Into Number of Data Sets or Data Group – A Review

Communications in computer and information science (2010)

Tai-hoon Kim

Data set

Clustering high-dimensional data

Consensus clustering

10.1007/978-3-642-16444-6_15

Cite

Citations (3)

Optimization of the clusters number of an improved fuzzy C-means clustering algorithm

XU Ye-jun

Cluster analysis is an unsupervised most important research topics in the field of pattern recognition. Fuzzy clustering from the sample to the category of uncertainty description, it is possible to more objectively reflect the real world. Traditional fuzzy clustering algorithm can not achieve the optimal allocation of the number of clusters is calculated automatically. In this paper, by adopting the idea of hierarchical clustering, one can automatically and efficiently determine the optimal number of clusters of new adaptive fuzzy c-means clustering algorithm-A-FCM algorithm. Numerical experiments show that the other through a variety of validity function to determine the number of clusters of adaptive fuzzy clustering algorithm, the better the performance of the method.

FLAME clustering

Single-linkage clustering

Hierarchical clustering

10.1109/iccse.2015.7250383

Cite

Citations (2)