Improved Partitioning Graph Embedding Framework for Small Cluster

Ding Sun,Zhen Huang,Dongsheng Li,Xiangyu Ye,Yilin Wang

Improved Partitioning Graph Embedding Framework for Small Cluster

2021

Graph embedding is a crucial method to produce node features that can be used for various machine learning tasks. Because of the large number of embedded parameters in large graphs, a single machine cannot load the entire graph into GPUs at once, so a partitioning strategy is required. However, there are some problems with partitioning strategies. Firstly, partitioning introduces data I/O and processing overhead, which increases training time, especially on the cluster with a small number of sites. Secondly, partitioning can affect the performance of the model. For multi-relation graphs, this effect is often negative. To address these problems, we propose the training pipeline and random partitions recombination methods. The training pipeline can reduce the time overhead by masking data loading time to GPU computation, and partitions recombination can effectively improve multi-relation model performance. We conducted experiments on multi-relation graphs and social networks, and the results show that both methods are effective.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations