logo
    PCM clustering based on noise level
    1
    Citation
    19
    Reference
    10
    Related Paper
    Citation Trend
    Abstract:
    Possibilistic c-means (PCM) based clustering algorithms are widely used in the literature. In this paper, we develop a noise level based PCM (NPCM) clustering algorithm. The advantage of NPCM is that strong prior information of the dataset is not required, and NPCM needs two kinds of information that is intuitive to specify for the clustering task, i.e., information of the cluster number and information of the property of clusters. More specifically, there are two parameters in NPCM: one specifies the possibly over-specified cluster number, and the other characterizes the closeness of clusters in the clustering result. Both parameters are not required to be exactly specified. Furthermore, we find that the update of bandwidth in adaptive PCM (APCM) is a positive feedback process and the adaptive bandwidth-uncertainty mechanism adopted in NPCM makes this positive feedback process more stronger, which leads to a faster convergence rate. Experiments show that the clustering process can be effectively controlled by the parameters.
    Keywords:
    Closeness
    Clustering is a technique adapted in many real world applications. Generally clustering can be thought of as partitioning the data into group or subsets, which contain analogous objects. A lot of clustering techniques like K-Means algorithm, Fuzzy C-Means algorithm (FCM), spectral clustering algorithm and so on has been proposed earlier in literature. Recently, clustering algorithms are extensively used for mixed data types to evaluate the performance of the clustering techniques. This paper presents a survey on various clustering algorithms that are proposed earlier in literature. Moreover it provides an insight into the advantages and limitations of some of those earlier proposed clustering techniques. The comparison of various clustering techniques is provided in this paper. The future enhancement section of this paper provides a general idea for improving the existing clustering algorithms to achieve better clustering accuracy.
    Data stream clustering
    Biclustering
    Single-linkage clustering
    Clustering high-dimensional data
    Citations (3)
    Clustering is a method of data analysis without the use of supervised data. Clustering method focusing on cluster size is expected to be useful for task distribution problems and several methods have been proposed. We proposed Fuzzy Even-sized Clustering Based on optimization (FECBO) and COntrolled-sized Clustering Based on Optimization (COCBO) as a method focusing on cluster size. However, these methods have the problem that they are susceptible to noise. It is believed that this issue can be overcome by applying noise clustering method. Noise clustering is a method that it classify noise into noise clusters. In this study, we extend FECBO and COCBO with noise clustering and verify its effectiveness through numerical examples.
    Clustering high-dimensional data
    Data stream clustering
    k-medians clustering
    A novel clustering technique based on the projection onto convex set (POCS) method, called POCS-based clustering algorithm, is proposed in this paper. The proposed POCS-based clustering algorithm exploits a parallel projection method of POCS to find appropriate cluster prototypes in the feature space. The algorithm considers each data point as a convex set and projects the cluster prototypes parallelly to the member data points. The projections are convexly combined to minimize the objective function for data clustering purpose. The performance of the proposed POCS-based clustering algorithm is verified through experiments on various synthetic datasets. The experimental results show that the proposed POCS-based clustering algorithm is competitive and efficient in terms of clustering error and execution speed when compared with other conventional clustering methods including Fuzzy C-Means (FCM) and K-means clustering algorithms.
    Data stream clustering
    k-medians clustering
    Single-linkage clustering
    Clustering high-dimensional data
    Citations (0)
    A novel clustering technique based on the projection onto convex set (POCS) method, called POCS-based clustering algorithm, is proposed in this paper. The proposed POCS-based clustering algorithm exploits a parallel projection method of POCS to find appropriate cluster prototypes in the feature space. The algorithm considers each data point as a convex set and projects the cluster prototypes parallelly to the member data points. The projections are convexly combined to minimize the objective function for data clustering purpose. The performance of the proposed POCS-based clustering algorithm is verified through experiments on various synthetic datasets. The experimental results show that the proposed POCS-based clustering algorithm is competitive and efficient in terms of clustering error and execution speed when compared with other conventional clustering methods including Fuzzy C-Means (FCM) and K-Means clustering algorithms.
    Data stream clustering
    Single-linkage clustering
    Clustering high-dimensional data
    k-medians clustering
    Constrained clustering
    Data stream clustering
    Single-linkage clustering
    Clustering high-dimensional data
    Clustering aggregation problem is a kind of formal description for clustering ensemble problem and technologies for the solving of clustering aggregation problem can be used to construct clustering division with better clustering performance when the clustering performances of each original clustering division are fluctuant or weak. In this paper, an approach based on genetic algorithm for clustering aggregation problem, named as GeneticCA, is presented To estimate the clustering performance of a clustering division, clustering precision is defined and features of clustering precision are discussed In our experiments about clustering performances of GeneticCA for document clustering, hamming neural network is used to construct clustering divisions with fluctuant and weak clustering performances. Experimental results show that the clustering performance of clustering division constructed by GeneticCA is better than clustering performance of original clustering divisions with clustering precision as criterion.
    Single-linkage clustering
    Data stream clustering
    Clustering high-dimensional data
    Citations (31)
    Affinity propagation (AP) is a widely used exemplar-based clustering approach with superior efficiency and clustering quality. Nevertheless, a common issue with AP clustering is the presence of excessive exemplars, which limits its ability to perform effective aggregation. This research aims to enable AP to automatically aggregate to produce fewer and more compact clusters, without changing the similarity matrix or customizing preference parameters, as done in existing enhanced approaches. An automatic aggregation enhanced affinity propagation (AAEAP) clustering algorithm is proposed, which combines a dependable partitioning clustering approach with AP to achieve this purpose. The partitioning clustering approach generates an additional set of findings with an equivalent number of clusters whenever the clustering stabilizes and the exemplars emerge. Based on these findings, mutually exclusive exemplar detection was conducted on the current AP exemplars, and a pair of unsuitable exemplars for coexistence is recommended. The recommendation is then mapped as a novel constraint, designated mutual exclusion and aggregation. To address this limitation, a modified AP clustering model is derived and the clustering is restarted, which can result in exemplar number reduction, exemplar selection adjustment, and other data point redistribution. The clustering is ultimately completed and a smaller number of clusters are obtained by repeatedly performing automatic detection and clustering until no mutually exclusive exemplars are detected. Some standard classification data sets are adopted for experiments on AAEAP and other clustering algorithms for comparison, and many internal and external clustering evaluation indexes are used to measure the clustering performance. The findings demonstrate that the AAEAP clustering algorithm demonstrates a substantial automatic aggregation impact while maintaining good clustering quality.
    Data stream clustering
    Constrained clustering
    Affinity propagation
    Single-linkage clustering
    Consensus clustering
    Clustering high-dimensional data
    Citations (1)
    Single-linkage clustering
    Constrained clustering
    Clustering high-dimensional data
    Data stream clustering
    k-medians clustering
    Today clustering-based machine learning algorithms are the important field in data mining. Here, medical data clustering is one of the core applications of data mining to predict and identify the risk factor of the disease. At the same time, medical data clustering is a very important and challenging task due to its complexity and high frequency of data. In order to achieve proper data clustering, this paper proposed a hybrid data clustering algorithm by the combination of [Formula: see text]-Means and Black Hole Entropic Fuzzy Clustering (BHEFC). [Formula: see text]-Means is the first and one of the most popular and low-computation cost partitioned-based clustering algorithms. There are two modules in this hybrid clustering, first some number of iterations are executed by the first module of this hybrid clustering algorithm, which is [Formula: see text]-Means clustering. After some number of iterations, the clustering solutions are shifted to the second module of this hybrid clustering algorithm, which is Entrophic Fuzzy Clustering. So, it can get the advantages of both algorithms. [Formula: see text]-Means clustering algorithm can produce fast clustering solution due to its low-computation cost. But it can go for premature convergence. To overcome this problem, the second module used BHEFC, which can use large amount of high-frequency medical data. The experimental results are done with the medical practitioners to predict the risk factors of the heart disease patients and doctors can give the suggestions based on the risk factors. Finally, the efficiency of the proposed Hybrid [Formula: see text]-Means and BHEFC is analyzed by three different performance measures.
    Data stream clustering
    Clustering high-dimensional data
    Citations (1)