A Distributed Processing Framework of Incremental Text Clustering under the Background of Big Data

2014 
In the era of big data, due to the rapid expansion of the data, the existing incremental text clustering algorithm has the drawback that the efficiency of algorithm will sharp decline with the time and data volume increasing. Because of poor timeliness and robustness, the algorithms are hard to be applied in practice. In this paper, we propose a distributed model framework of Single-Pass algorithm based on MapReduce, the experiments result of increment text cluster is accuracy, the algorithm effectively improve the computing efficiency of the algorithm and real-time of result. Algorithm has a great prospect under the background of big data.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    3
    References
    0
    Citations
    NaN
    KQI
    []