Label Propagation Based Community Detection Algorithm with Dpark

2015 
Numerous methods for detecting communities on social networks have been proposed in recent years. However, the performance and scalability of the algorithms are not enough to work on the real-world large-scale social networks. In this paper, we propose Improved Speaker-listener Label Propagation Algorithm (iSLPA), an efficient and fully distributed method for community detection. It is implemented with Dpark, which is a Python version of Spark and a lightning-fast cluster computing framework. To the best of our knowledge, this is the first attempt at community detection on Dpark. It can automatically work on three kinds of networks: directed networks, undirected networks, and especially bipartite networks. In iSLPA, we propose a new initialization and updating strategy to improve the quality and scalability for detecting communities. And we conduct our experiments on real-world social networks datasets on both benchmark networks and Douban (http://www.douban.com) user datasets. Experimental results demonstrate that iSLPA has a comparable performance than SLPA, and have confirmed our algorithms is very efficient and effective on the overlapping community detection of large-scale networks.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    24
    References
    1
    Citations
    NaN
    KQI
    []