Optimization of the K-means algorithm for the solution of high dimensional instances

2016 
This paper addresses the problem of clustering instances with a high number of dimensions. In particular, a new heuristic for reducing the complexity of the K-means algorithm is proposed. Traditionally, there are two approaches that deal with the clustering of instances with high dimensionality. The first executes a preprocessing step to remove those attributes of limited importance. The second, called divide and conquer, creates subsets that are clustered separately and later their results are integrated through post-processing. In contrast, this paper proposes a new solution which consists of the reduction of distance calculations from the objects to the centroids at the classification step. This heuristic is derived from the visual observation of the clustering process of K-means, in which it was found that the objects can only migrate to adjacent clusters without crossing distant clusters. Therefore, this heuristic can significantly reduce the number of distance calculations from an object to the cent...
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    7
    References
    2
    Citations
    NaN
    KQI
    []