Neighborhood Density based Clustering with Agglomerative Fuzzy K-Means Algorithm

CiiT international journal of data mining and knowledge engineering (2012)

Citation

Reference

Related Paper

Abstract:

Clustering is one of the primary tools in unsupervised learning. Clustering means creating groups of objects based on their features in such a way that the objects belonging to the same groups are similar and those belonging to different groups are dissimilar. K-means is one of the most widely used algorithms in clustering because of its simplicity and performance. The initial centriod for k-means clustering is generated randomly. In this paper, we address a method for effectively selecting initial cluster center. This method identifies the high density neighborhood (NSS) from the data and then select initial centroid of the neighborhoods as initial centers. Agglomerative Fuzzy k-means (Ak-means) clustering algorithm is then utilized to further merge these initial centers to get the preferred number of clusters and create better clustering results. Merging method is employed to produce more consistent clustering results from different sets of initial clusters centers. Experimental observations on several data sets have proved that the proposed clustering approach was very significant in automatically identifying the true cluster number and also providing correct clustering results.

Keywords:

Single-linkage clustering

Complete-linkage clustering

Hierarchical clustering

FLAME clustering

k-medians clustering

Centroid

Brown clustering

Clustering high-dimensional data

Topics:

Advanced Clustering Algorithms Research

Face and Expression Recognition

Source

Cite

Improved hierarchical K-means clustering algorithm

Computer Engineering and Applications Journal (2013)

Hu Wei

This paper presents an improved hierarchical K-means clustering algorithm combining hierarchical structure of space,in order to solve the problem that bad result of traditional K-means clustering method by selecting the number of categories randomly before clustering.By primary K-means clustering,it determines whether re-clustering in the more fine level by the result of initial clustering.By repeated execution,a hierarchical K-means clustering tree is produced,and the number of clusters is selected automatically on this tree structure.Simulation results on UCI datasets demonstrate that comparing with traditional K-means clustering means,the better clustering results are obtained by the hierarchical K-means clustering model.

Single-linkage clustering

Hierarchical clustering

Brown clustering

Data stream clustering

Source

Cite

Citations (13)

New method for determining optimal number of clusters in K-means clustering algorithm

Computer Engineering and Applications Journal (2010)

Xu-Qing Tang

K-means clustering algorithm clusters datasets on the premise that the number of clusters is certain and initial clustering centers are selected randomly.In general the value of k cann't be confirmed beforehand,and randomly selected initial clustering centers make the result of clustering unstable.A new method for determining optimal number of clusters in K-means clustering algorithm is presented to analyze the clustering quality and determine optimal number of clusters through making the number of clusters produced by AP be the upper limit kmax of search range for the number of clusters,selecting the Silhouette validity index and setting initial clustering centers based on maximum and minimum distance algorithm.Simulation experiment and analysis demonstrate the feasibility of the above-mentioned algorithm.

Single-linkage clustering

k-medians clustering

Complete-linkage clustering

Source

Cite

Citations (23)

K-Harmonic means type clustering algorithm for mixed datasets

Applied Soft Computing (2016)

Amir Ahmad Sarosh Hashmi

Initialization

k-medians clustering

Categorical variable

Single-linkage clustering

Hierarchical clustering

k-medoids

10.1016/j.asoc.2016.06.019

Cite

Citations (51)

GBKM: A New Genetic Based K-Means Clustering Algorithm

Mahnaz Mardi Mohammad Reza Keyvanpour

Clustering is an unsupervised classification method that focused on grouping data into clusters. The objects in each cluster are very similar but different from the objects in the other clusters. As clustering methods deal with the massive amount of information, many intelligent software agents have been widely utilized clustering techniques to filter, retrieve, and categorize documents that exist on the World Wide Web. Web mining is generally classified under data mining. In data mining, one of the significant clustering centroid-based partitioning methods is the K-Means algorithm. One of the K-Means algorithm's challenges is its extreme sensitivity to initial cluster centers' choice, which may yield get stuck in the local optimum if the initial centers are selected randomly. A variant of the K-Means method is the K-Means++ algorithm, which improves the algorithm's performance by smart choices of initialization of the cluster centroids. Evolutionary techniques, widely utilized for optimizing clustering methods by providing their prerequisite parameters. The Genetic Algorithm is stochastic and population-based, that applied in optimization problem-solving. This paper proposed a Genetic-based K-Means (GBKM) clustering algorithm where the clusters' centroids are encoded by chromosomes rather than random initial cluster centroids. The best cluster centers gave by the Genetic algorithm that maximizes the fitness function, as initial points of the K-Means algorithm. The results show this model helps increase the K-Means algorithm's performance by appropriate choice of initialization of the cluster centroids, compared to four other clustering algorithms.

Initialization

Centroid

k-medians clustering

Single-linkage clustering

10.1109/icwr51868.2021.9443113

Cite

Citations (10)

A Fuzzy Approach for Text Mining

International Journal of Mathematical Sciences and Computing (2015)

Deepa B. Patil Yashwant Dongre

Document clustering is an integral and important part of text mining.There are two types of clustering, namely, hard clustering and soft clustering.In case of hard clustering, data item belongs to only one cluster whereas in soft clustering, data point may fall into more than one cluster.Thus, soft clustering leads to fuzzy clustering wherein each data point is associated with a membership function that expresses the degree to which individual data points belong to the cluster.Accuracy is desired in information retrieval, which can be achieved by fuzzy clustering.In the work presented here, a fuzzy approach for text classification is used to classify the documents into appropriate clusters using Fuzzy C Means (FCM) clustering algorithm.Enron email dataset is used for experimental purpose.Using FCM clustering algorithm, emails are classified into different clusters.The results obtained are compared with the output produced by k means clustering algorithm.The comparative study showed that the fuzzy clusters are more appropriate than hard clusters.

Single-linkage clustering

FLAME clustering

Clustering high-dimensional data

Document Clustering

k-medians clustering

10.5815/ijmsc.2015.04.04

Cite

Citations (15)

A K-Means Algorithm Based on the Optimal Initial Clustering Center

Computer Engineering and Science (2010)

Xiaoyu Zhang

Traditional K-means clustering algorithms are sensitive to the selection of initial clustering centers and isolated points.Considering these problems,a new method based on the density of points is presented in this paper.First of all,we select initial clustering centers through the proposed method.Then,we apply a K-means clustering algorithm to cluster the data,and process the isolated points especially.The experimental results demonstrate that the proposed method can get better clustering results.

Data stream clustering

Single-linkage clustering

k-medians clustering

Clustering high-dimensional data

Source

Cite

Citations (2)

K-means Clustering Algorithm with Improved Initial Center

Chen Zhang Shixiong Xia

In this paper we present a new clustering method based on K-means that have avoided alternative randomness of initial center. This paper focused on K-means algorithm to the initial value of the dependence of K selected from the aspects of the algorithm is improved. First, the initial clustering number is radicN. Second, through the application of the sub-merger strategy the categories were combined.The algorithm does not require the user is given in advance the number of cluster. Experiments on synthetic datasets are presented to have shown significant improvements in clustering accuracy in comparison with the random K-means.

k-medians clustering

Center (category theory)

Single-linkage clustering

Data stream clustering

10.1109/wkdd.2009.210

Cite

Citations (111)

Clustering Algorithms: a Review

Nonconvex optimization and its applications (1996)

Boris Mirkin

Conceptual clustering

Single-linkage clustering

Data stream clustering

Clustering high-dimensional data

10.1007/978-1-4613-0457-9_3

Cite

Citations (7)

An improved fuzzy C-means clustering algorithm

Lingzi Duan Fusheng Yu Zhan Li

Sensitive to the initial number and centers of clusters is one shortcoming of fuzzy c-means clustering method. Aiming to reduce the sensitivity, a partial supervision-based fuzzy c-means clustering method is proposed in this paper. In this method, the data is first clustered with standard fuzzy c-means algorithm. If the clustering result doesn't accord with the structure of data, there must be one or more clusters that have been wrongly separated resulting in some clusters close to each other. The close clusters can be found by investigating the partition matrix. Those close clusters should be divided or merged. In both situations, approaches are then proposed in this new method to update the appropriate cluster number and cluster centers. With the updated cluster centers as labeled patterns, partially supervised fuzzy clustering is carried to give the appropriate clusters. Experiments on four synthetic datasets and a real dataset show that the proposed clustering method has good performance by comparing to the standard fuzzy c-means clustering method.

Single-linkage clustering

FLAME clustering

k-medians clustering

Complete-linkage clustering

10.1109/fskd.2016.7603349

Cite

Citations (12)

HDenDist: Nonlinear Hierarchical Clustering Based on Density and Min-distance

Wenqi Fan Chang‐Dong Wang Yuanwei Chen Jianhuang Lai

Hierarchical clustering has received a great amount of attention due to the capability of capturing hierarchical cluster structure in an unsupervised way. Despite great success, most of the existing hierarchical clustering algorithms have some drawbacks: (1) difficulty in selecting clusters to merge or split, (2) inefficient and inaccurate cluster validation, (3) limitation to only linearly separable clusters. To address the above issues, this paper proposes a new nonlinear hierarchical clustering method termed HDenDist. The proposed method is based on two observations/designs associated with density and min-distance. One is that cluster centers have a higher density and are surrounded by data points of lower density, and the distance between cluster centers is relatively long, the other is that we design a min-distance between nodes, which can be used to determine how to divide the nodes in the hierarchical tree into two sub-cluster nodes. Some dividing and ruling tricks are designed that can further reduce the sensitivity to parameters. What's more, the density and distance are combined to determine when to terminate the split of the cluster nodes. In experimental studies, the proposed method has shown promising results on real datasets.

Hierarchical clustering

Merge (version control)

Single-linkage clustering

Distance measures

10.1109/bdcloud.2015.16

Cite

Citations (4)