Improved Partitioning Graph Embedding Framework for Small Cluster

2021 
Graph embedding is a crucial method to produce node features that can be used for various machine learning tasks. Because of the large number of embedded parameters in large graphs, a single machine cannot load the entire graph into GPUs at once, so a partitioning strategy is required. However, there are some problems with partitioning strategies. Firstly, partitioning introduces data I/O and processing overhead, which increases training time, especially on the cluster with a small number of sites. Secondly, partitioning can affect the performance of the model. For multi-relation graphs, this effect is often negative. To address these problems, we propose the training pipeline and random partitions recombination methods. The training pipeline can reduce the time overhead by masking data loading time to GPU computation, and partitions recombination can effectively improve multi-relation model performance. We conducted experiments on multi-relation graphs and social networks, and the results show that both methods are effective.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []