Real-time graph partition and embedding of large network

2018 
Recently, large-scale networks attract significant attention to analyze and extract the hidden information of big data. Toward this end, graph embedding is a method to embed a high dimensional graph into a much lower dimensional vector space while maximally preserving the structural information of the original network. However, effective graph embedding is particularly challenging when massive graph data are generated and processed for real-time applications. In this paper, we address this challenge and propose a new real-time and distributed graph embedding algorithm (RTDGE) that is capable of distributively embedding a large-scale graph in a streaming fashion. Specifically, our RTDGE consists of the following components: (1) a graph partition scheme that divides all edges into distinct subgraphs, where vertices are associated with edges and may belong to several subgraphs; (2) a dynamic negative sampling (DNS) method that updates the embedded vectors in real-time; and (3) an unsupervised global aggregation scheme that combines all locally embedded vectors into a global vector space. Furthermore, we also build a real-time distributed graph embedding platform based on Apache Kafka and Apache Storm. Extensive experimental results show that RTDGE outperforms existing solutions in terms of graph embedding efficiency and accuracy.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    21
    References
    4
    Citations
    NaN
    KQI
    []