基于 Hadoop MapReduce并行近似谱聚类算法研究与实现

2015 
With the advent of information age, the large-scale high-dimensional data generated in Internet increases exponentially, its spectral clustering suffers from the bottleneck problem in both computational time and memory use, particularly in solving Laplacian matrix eigenvector decomposition.Given the advantages of Hadoop MapReduce parallel programming model in processing intensive data, based on t nearest neighbour sparse approximation similarity Laplacian matrix, in this paper we design Hadoop MapReduce parallel approximate spectral clustering algorithm to solve the above-mentioned bottleneck problem.The experiment uses UCI Bag of Words dataset to validate the correctness and effectiveness of the designed algorithm, result indicates that the parallel design aligns with a certain desired effect in terms of spectral clustering quality and performance.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []