Improved K-Means with Dimensionality Reduction Technique

2011 
Clustering is the process of finding groups of objects such that the objects in a group will be similar to one another and different from the objects in other groups. K-means is a well known partitioning based clustering technique that attempts to find a user specified number of clusters represented by their centroid. K-means clustering algorithm often does not work well for high dimension; hence, to improve the efficiency, we apply PCA, dimensionality reduction technique, on data set and obtain a reduced dataset containing possibly uncorrelated variables. The challenging task for any clustering method is to determine the number of clusters beforehand. To find the number of cluster, we apply EM method that finds number of clusters user should choose by determining a mixture of Gaussians that fit a given data set. Finally the experiment results shows that the use of techniques such as PCA and EM, improve the efficiency of K-means clustering.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []