Neighborhood Density based Clustering with Agglomerative Fuzzy K-Means Algorithm
0
Citation
0
Reference
20
Related Paper
Abstract:
Clustering is one of the primary tools in unsupervised learning. Clustering means creating groups of objects based on their features in such a way that the objects belonging to the same groups are similar and those belonging to different groups are dissimilar. K-means is one of the most widely used algorithms in clustering because of its simplicity and performance. The initial centriod for k-means clustering is generated randomly. In this paper, we address a method for effectively selecting initial cluster center. This method identifies the high density neighborhood (NSS) from the data and then select initial centroid of the neighborhoods as initial centers. Agglomerative Fuzzy k-means (Ak-means) clustering algorithm is then utilized to further merge these initial centers to get the preferred number of clusters and create better clustering results. Merging method is employed to produce more consistent clustering results from different sets of initial clusters centers. Experimental observations on several data sets have proved that the proposed clustering approach was very significant in automatically identifying the true cluster number and also providing correct clustering results.Keywords:
Single-linkage clustering
Complete-linkage clustering
Hierarchical clustering
FLAME clustering
k-medians clustering
Centroid
Brown clustering
Clustering high-dimensional data
Cite
This paper presents an improved hierarchical K-means clustering algorithm combining hierarchical structure of space,in order to solve the problem that bad result of traditional K-means clustering method by selecting the number of categories randomly before clustering.By primary K-means clustering,it determines whether re-clustering in the more fine level by the result of initial clustering.By repeated execution,a hierarchical K-means clustering tree is produced,and the number of clusters is selected automatically on this tree structure.Simulation results on UCI datasets demonstrate that comparing with traditional K-means clustering means,the better clustering results are obtained by the hierarchical K-means clustering model.
Single-linkage clustering
Hierarchical clustering
Brown clustering
Data stream clustering
Cite
Citations (13)
K-means clustering algorithm clusters datasets on the premise that the number of clusters is certain and initial clustering centers are selected randomly.In general the value of k cann't be confirmed beforehand,and randomly selected initial clustering centers make the result of clustering unstable.A new method for determining optimal number of clusters in K-means clustering algorithm is presented to analyze the clustering quality and determine optimal number of clusters through making the number of clusters produced by AP be the upper limit kmax of search range for the number of clusters,selecting the Silhouette validity index and setting initial clustering centers based on maximum and minimum distance algorithm.Simulation experiment and analysis demonstrate the feasibility of the above-mentioned algorithm.
Single-linkage clustering
k-medians clustering
Complete-linkage clustering
Cite
Citations (23)
Initialization
k-medians clustering
Categorical variable
Single-linkage clustering
Hierarchical clustering
k-medoids
Cite
Citations (51)
Clustering is an unsupervised classification method that focused on grouping data into clusters. The objects in each cluster are very similar but different from the objects in the other clusters. As clustering methods deal with the massive amount of information, many intelligent software agents have been widely utilized clustering techniques to filter, retrieve, and categorize documents that exist on the World Wide Web. Web mining is generally classified under data mining. In data mining, one of the significant clustering centroid-based partitioning methods is the K-Means algorithm. One of the K-Means algorithm's challenges is its extreme sensitivity to initial cluster centers' choice, which may yield get stuck in the local optimum if the initial centers are selected randomly. A variant of the K-Means method is the K-Means++ algorithm, which improves the algorithm's performance by smart choices of initialization of the cluster centroids. Evolutionary techniques, widely utilized for optimizing clustering methods by providing their prerequisite parameters. The Genetic Algorithm is stochastic and population-based, that applied in optimization problem-solving. This paper proposed a Genetic-based K-Means (GBKM) clustering algorithm where the clusters' centroids are encoded by chromosomes rather than random initial cluster centroids. The best cluster centers gave by the Genetic algorithm that maximizes the fitness function, as initial points of the K-Means algorithm. The results show this model helps increase the K-Means algorithm's performance by appropriate choice of initialization of the cluster centroids, compared to four other clustering algorithms.
Initialization
Centroid
k-medians clustering
Single-linkage clustering
Cite
Citations (10)
Document clustering is an integral and important part of text mining.There are two types of clustering, namely, hard clustering and soft clustering.In case of hard clustering, data item belongs to only one cluster whereas in soft clustering, data point may fall into more than one cluster.Thus, soft clustering leads to fuzzy clustering wherein each data point is associated with a membership function that expresses the degree to which individual data points belong to the cluster.Accuracy is desired in information retrieval, which can be achieved by fuzzy clustering.In the work presented here, a fuzzy approach for text classification is used to classify the documents into appropriate clusters using Fuzzy C Means (FCM) clustering algorithm.Enron email dataset is used for experimental purpose.Using FCM clustering algorithm, emails are classified into different clusters.The results obtained are compared with the output produced by k means clustering algorithm.The comparative study showed that the fuzzy clusters are more appropriate than hard clusters.
Single-linkage clustering
FLAME clustering
Clustering high-dimensional data
Document Clustering
k-medians clustering
Cite
Citations (15)
Traditional K-means clustering algorithms are sensitive to the selection of initial clustering centers and isolated points.Considering these problems,a new method based on the density of points is presented in this paper.First of all,we select initial clustering centers through the proposed method.Then,we apply a K-means clustering algorithm to cluster the data,and process the isolated points especially.The experimental results demonstrate that the proposed method can get better clustering results.
Data stream clustering
Single-linkage clustering
k-medians clustering
Clustering high-dimensional data
Cite
Citations (2)
In this paper we present a new clustering method based on K-means that have avoided alternative randomness of initial center. This paper focused on K-means algorithm to the initial value of the dependence of K selected from the aspects of the algorithm is improved. First, the initial clustering number is radicN. Second, through the application of the sub-merger strategy the categories were combined.The algorithm does not require the user is given in advance the number of cluster. Experiments on synthetic datasets are presented to have shown significant improvements in clustering accuracy in comparison with the random K-means.
k-medians clustering
Center (category theory)
Single-linkage clustering
Data stream clustering
Cite
Citations (111)
Conceptual clustering
Single-linkage clustering
Data stream clustering
Clustering high-dimensional data
Cite
Citations (7)
Sensitive to the initial number and centers of clusters is one shortcoming of fuzzy c-means clustering method. Aiming to reduce the sensitivity, a partial supervision-based fuzzy c-means clustering method is proposed in this paper. In this method, the data is first clustered with standard fuzzy c-means algorithm. If the clustering result doesn't accord with the structure of data, there must be one or more clusters that have been wrongly separated resulting in some clusters close to each other. The close clusters can be found by investigating the partition matrix. Those close clusters should be divided or merged. In both situations, approaches are then proposed in this new method to update the appropriate cluster number and cluster centers. With the updated cluster centers as labeled patterns, partially supervised fuzzy clustering is carried to give the appropriate clusters. Experiments on four synthetic datasets and a real dataset show that the proposed clustering method has good performance by comparing to the standard fuzzy c-means clustering method.
Single-linkage clustering
FLAME clustering
k-medians clustering
Complete-linkage clustering
Cite
Citations (12)
Hierarchical clustering has received a great amount of attention due to the capability of capturing hierarchical cluster structure in an unsupervised way. Despite great success, most of the existing hierarchical clustering algorithms have some drawbacks: (1) difficulty in selecting clusters to merge or split, (2) inefficient and inaccurate cluster validation, (3) limitation to only linearly separable clusters. To address the above issues, this paper proposes a new nonlinear hierarchical clustering method termed HDenDist. The proposed method is based on two observations/designs associated with density and min-distance. One is that cluster centers have a higher density and are surrounded by data points of lower density, and the distance between cluster centers is relatively long, the other is that we design a min-distance between nodes, which can be used to determine how to divide the nodes in the hierarchical tree into two sub-cluster nodes. Some dividing and ruling tricks are designed that can further reduce the sensitivity to parameters. What's more, the density and distance are combined to determine when to terminate the split of the cluster nodes. In experimental studies, the proposed method has shown promising results on real datasets.
Hierarchical clustering
Merge (version control)
Single-linkage clustering
Distance measures
Cite
Citations (4)