logo
    A robust fuzzy approach for gene expression data clustering
    11
    Citation
    23
    Reference
    10
    Related Paper
    Citation Trend
    Keywords:
    Data stream clustering
    Single-linkage clustering
    k-medians clustering
    Clustering high-dimensional data
    Abstract In the big data era, clustering is one of the most popular data mining method. The majority of clustering algorithms have complications like automatic cluster number determination, poor clustering precision, inconsistent clustering of various datasets and parameter-dependent etc. A new fuzzy autonomous solution for clustering named Meskat-Mahmudul (MM) clustering algorithm proposed to overcome the complexity of parameter–free automatic cluster number determination and clustering accuracy. MM clustering algorithm finds out the exact number of clusters based on Average Silhouette method in multivariate mixed attribute dataset, including real-time gene expression dataset and dealt missing values, noise and outliers. MM Extended K-Means (MMK) clustering algorithm is an enhancement of the K-Means algorithm, which serves the purpose for automatic cluster discovery and runtime cluster placement. Several validation methods used to evaluate cluster and certify optimum cluster partitioning and perfection. Some datasets used to assess the performance of the proposed algorithms to other algorithms in terms of time complexity and clustering efficiency. Finally, MM clustering and MMK clustering algorithms found superior over conventional algorithms.
    Data stream clustering
    Single-linkage clustering
    k-medians clustering
    Clustering high-dimensional data
    Data stream clustering
    Clustering high-dimensional data
    Single-linkage clustering
    Constrained clustering
    DBSCAN
    Data stream clustering
    Single-linkage clustering
    k-medians clustering
    Clustering high-dimensional data
    Citations (11)
    <span>Clustering is a significant approach in data mining, which seeks to find groups or clusters of data. Both numeric and categorical features are frequently used to define the data in real-world applications. Several different clustering algorithms are proposed for the numerical and categorical datasets. In clustering algorithms, the quality of clustering results is evaluated using cluster validation. This paper proposes an efficient clustering algorithm for mixed numerical and categorical data using re-clustering and cluster validation. Initially, the mixed dataset is clustered with four traditional clustering algorithms like expectation-maximization (EM), hierarchical cluster (HC), k-means (KM), and self-organizing map (SOM). These four algorithms are validated, and the best algorithm is selected for re-clustering. It is an iterative process for improving the quality of cluster results. The incorrectly clustered data is iteratively re-clustered and evaluated based on the cluster validation. The performance of the proposed clustering method is evaluated with a real-time dataset in terms of purity, normalized mutual information, rand index, precision, and recall. The experimental results have shown that the proposed reclust algorithm achieves better performance compared to other clustering algorithms.</span>
    Single-linkage clustering
    Data stream clustering
    Categorical variable
    Hierarchical clustering
    k-medians clustering
    Clustering high-dimensional data
    To deal with large-scale data clustering problems,a speeding K-means parallel clustering method was presented which randomly sampled first and then used max-min distance means to carry out K-means parallel clustering.Sampling based method avoids the problem of clustering in local solutions and max-min distance based method makes the initial clustering centers tend to be optimum.Results of a large number of experiments show that the proposed method is affected less by the initial clustering center and improves the precision of clustering in both stand-alone environment and cluster environment.It also reduces the number of iterations of clustering and the clustering time.
    Data stream clustering
    Single-linkage clustering
    k-medians clustering
    Clustering high-dimensional data
    Citations (0)
    Correlation clustering problem is a NP hard problem and technologies for the solving of correlation clustering problem can be used to cluster given data set with relation matrix for data in the given data set. In this paper, an approach based on genetic algorithm for correlation clustering problem, named as GeneticCC, is presented. To estimate the performance of a clustering division, data correlation based clustering precision is defined and features of clustering precision are discussed in this paper. Experimental results show that the performance of clustering division for UCI document data set constructed by GeneticCC is better than clustering performance of other clustering divisions constructed by SOM neural network with clustering precision as criterion.
    Single-linkage clustering
    Data stream clustering
    Clustering high-dimensional data
    k-medians clustering
    Citations (11)
    With the increasing size of data set,improving the efficiency of K-modes clustering algorithm or fuzzy K-modes clustering algorithm is becoming a critical issue.In order to improve the efficiency of the algorithm,a clustering method based on divided and conquered method was proposed.This method,not a one-time clustering of all data,divided the data set into several subsets,and each subset was clustered at the same time;the fusion results of each subset cluster form the final clustering results.The results show that the efficiency of clustering has been increased greatly compared with traditional clustering method in most cases.
    Single-linkage clustering
    Data stream clustering
    Clustering high-dimensional data
    Categorical variable
    k-medians clustering
    Citations (0)
    The traditional H-K clustering algorithm can solve the randomness and apriority of the initial centers of K-means clustering algorithm. However, it will lead to a dimensional disaster problem when apply to high dimensional dataset clustering due to its high computational complexity. Clustering ensemble exerts ensemble learning technique to get a better clustering result through learning merged data set of multiple clustering results. The objective of this paper is to improve the performance of traditional H-K clustering algorithm in high dimensional datasets. Using ensemble learning, a new clustering algorithm is proposed named EPCAHK (Ensemble Principle Component Analysis Hierarchical K-means Clustering algorithm). In the EPCAHK algorithm, the high dimensional dataset is mapped into a low dimensional space using PCA method firstly. Subsequently, the clustering results of the hierarchical stage for obtaining initial information (e.g., the cluster number or the initial clustering centers) are integrated by using the min-transitive closure method. Finally, the final clustering result is achieved by using K-means clustering algorithm based on the ensemble clustering results above. The experimental results indicate that comparing to the traditional H-K clustering algorithm, the EPCAHK obtains a better clustering result. The average accuracy of the clustering results can reach up to 90% or above, and the stability for the large high dimensional dataset is also improved.
    Ensemble Learning
    K-Means Clustering
    Citations (10)
    Clustering has been used in various disciplines like software engineering, statistics, data mining, image analysis, machine learning, Web cluster engines, and text mining in order to deduce the groups in large volume of data. The notion behind clustering is to ascribe the objects to clusters in such a way that objects in one cluster are more homogeneous to other clusters. There are variegated clustering algorithms available viz k-means clustering, cobweb clustering, db-scan clustering, fartherstfirst clustering, and x-means clustering algorithm but K-means on the whole comprehensively used algorithm for unsupervised clustering dilemma. In this paper k-means clustering is being optimised using genetic algorithm so that the problems of k-means can be overridden. The outcomes of k-means clustering and genetic k-means clustering are evaluated and compared; obtained result shows K-means with GA algorithm suggest new improvements in this research domain.
    Single-linkage clustering
    Data stream clustering
    Clustering high-dimensional data
    k-medians clustering
    DBSCAN
    Citations (102)