Improvements and Implementation of Hierarchical Clustering based on Hadoop

2015 
As the traditional agglomerative hierarchical clustering has a higher number of iterations which makes low efficiency of parallel realization on Hadoop, we propose an improved hierarchical clustering method: when the between-class distance is monotonically increasing, by changing the clustering order of hierarchical clustering without changing the final clustering result, aggregate multiple classes in a MapReduce operation, to reduce the number of iterations then enhance the computational efficiency. The experiments show compared to traditional hierarchical clustering algorithm implemented in Hadoop, the improved algorithm implemented in Hadoop has greatly reduces the number of iterations and the computation time.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    3
    References
    0
    Citations
    NaN
    KQI
    []