logo
    Clustering by hybrid K-Means and black hole entropic fuzzy clustering algorithm for medical data
    1
    Citation
    15
    Reference
    10
    Related Paper
    Citation Trend
    Abstract:
    Today clustering-based machine learning algorithms are the important field in data mining. Here, medical data clustering is one of the core applications of data mining to predict and identify the risk factor of the disease. At the same time, medical data clustering is a very important and challenging task due to its complexity and high frequency of data. In order to achieve proper data clustering, this paper proposed a hybrid data clustering algorithm by the combination of [Formula: see text]-Means and Black Hole Entropic Fuzzy Clustering (BHEFC). [Formula: see text]-Means is the first and one of the most popular and low-computation cost partitioned-based clustering algorithms. There are two modules in this hybrid clustering, first some number of iterations are executed by the first module of this hybrid clustering algorithm, which is [Formula: see text]-Means clustering. After some number of iterations, the clustering solutions are shifted to the second module of this hybrid clustering algorithm, which is Entrophic Fuzzy Clustering. So, it can get the advantages of both algorithms. [Formula: see text]-Means clustering algorithm can produce fast clustering solution due to its low-computation cost. But it can go for premature convergence. To overcome this problem, the second module used BHEFC, which can use large amount of high-frequency medical data. The experimental results are done with the medical practitioners to predict the risk factors of the heart disease patients and doctors can give the suggestions based on the risk factors. Finally, the efficiency of the proposed Hybrid [Formula: see text]-Means and BHEFC is analyzed by three different performance measures.
    Keywords:
    Data stream clustering
    Clustering high-dimensional data
    Data stream clustering
    Clustering high-dimensional data
    Single-linkage clustering
    Constrained clustering
    DBSCAN
    Data stream clustering
    Clustering high-dimensional data
    Citations (0)
    Data stream clustering
    Single-linkage clustering
    Clustering high-dimensional data
    Consensus clustering
    Citations (67)
    Clustering is considered as widely used data mining practices. Clustering is the technique of dividing entire dataset in certain clusters created on the comparable characteristics of the instances. On the foundation of the likeness between the instances of data, grouping or clustering the instances of the large database regardless of its size is considered as significant chunk of data mining. There are plentiful approaches of clustering but this book mainly focuses on improving k-Means clustering algorithm. This method clusters the input dataset in quantified number (k) of groups. This method is verified to be very efficient when while dealing with small data, but for huge data, it fails in time complexity; it takes time more than usual. This work mainly aims comparison of k-means clustering scheme with ranking method to speed up the comprehensive running time for k-Means clustering algorithm. The experimental results clearly confirms that the new technique is more time efficient than the old-style k-Means clustering method.
    Data stream clustering
    Single-linkage clustering
    Clustering high-dimensional data
    Constrained clustering
    Citations (4)
    Clustering in the data stream,the redundant features will affect the quality of data clustering,removing redundant features to improve the clustering quality is very important.To solve this problem,it is proposed that a data stream clustering algorithm based on feature selection(DSCFC).It is one-pass clustering algorithms,these are applied that ranking feature,grading feature,detecting redundant features and removing the redundant features algorithm and so on.The experimental results indicated that DSCFC algorithm can detect hidden redundant features in data stream and remove redundant features;when there are redundant features in the data stream clustering,the algorithm is more efficient than CluStream,clustering quality is better.
    Data stream clustering
    Clustering high-dimensional data
    Citations (0)
    In the real data world, there are various clustering algorithms available in data mining. The data available from the different data sources may be huge in instances, attributes and in different formats. The clustering algorithms available are assessed based on how the algorithm cluster the given data and find its parametric values. The clustering of data may end in inappropriate results if the algorithm is not chosen wisely. This paper proposes a comparison between diverse clustering algorithms such as K Means clustering, Mini-Batch K Means clustering, Hierarchical clustering, Bagging and Boosting by figuring out clustering strategies using high dimensional datasets on each algorithm above. After the process of data cleaning in dataset, we have clustered the datasets and compared the summary of each to showcase the comparability of difference in their strategical values such as Clustering tendency, clustering quality and data driven approach for evaluating the number of clusters, Normalized Mutual Information (NMI) metric and provide an idea to choose the algorithm for clustering the data effectively. And as a result, Local Clustering Coefficient (LCC) with K-means clustering bunching method performs better than the other clustering algorithms and the results are reported.
    Data stream clustering
    Single-linkage clustering
    Clustering high-dimensional data
    Hierarchical clustering
    Comparability
    Citations (0)
    In order to improve the clustering quality of evolving data stream, this paper introduces a new data stream clustering algorithm, clustering over data Stream based on Semi-supervised Affinity Propagation(SAPStream), this algorithm calculates the similarity matrix of the initial data with the idea of semi-supervised, executes AP cluster, and then builds online clustering model. With the evolution of the data stream, the clustering model adjusts using decay windows technology, and the data stream clustering results are got by executing cluster again over the exemplars and new arrival data points. SAPStream can analyze and deal with large-scale evolving data stream. Its performance is tested by using both real datasets and synthetic datasets. Experimental results show this algorithm achieves a higher quality of clustering.
    Data stream clustering
    Affinity propagation
    Single-linkage clustering
    Clustering high-dimensional data
    Similarity (geometry)
    Citations (0)
    Clustering analysis is an important subject in data mining. In many real applications,the clustering data are usually high dimensional. For example,the document data and DNA microarray data generally have several hundreds or even a thousand dimensions. While in high dimensional space,the distributions of the data are usually sparse; it makes most of those traditional clustering algorithms which work well on low-dimensional data invalid for high-dimensional data. To solve such a problem,a new high-dimensional data clustering approach based on genetic algorithms is proposed in this paper. The search capability of genetic algorithms is exploited to find the effective feature subspaces for clustering. In order to study the characteristics of dimensions shown in clustering,the degree of features which contribute to subspace clustering is designed as fitness function in this paper. The experimental results on the artificial data set,real-life data set and the comparison experiment with the k-means algorithm indicate the feasibility and efficiency of the proposed approach.
    Clustering high-dimensional data
    Data stream clustering
    Data set
    Citations (0)