GraBi: Communication-Efficient and Workload-Balanced Partitioning for Bipartite Graphs

2020 
Machine Learning and Data Mining (MLDM) applications, such as recommendation and topic modeling, generally represent their input data in bipartite graphs with two disjoint vertex-subsets connected only by edges between them. Despite the prevalence of bipartite graphs, existing graph partitioning frameworks have rarely sufficiently exploited their unique structures, especially the highly lopsided subset sizes and extremely skewed vertex degrees. As a result of poor partitioning quality, problems, particularly of high communication cost and severe workload imbalance, arise during subsequent computation over these bipartite graphs in distributed environments such as datacenters or HPC systems, significantly hampering the performance of MLDM applications. In this paper, we approach these problems by communication-efficient and workload-balanced partitioning of bipartite graphs, which fully exploits the vertex vectorization in MLDM algorithms and inherent asymmetry in bipartite graphs. To this end, we present GraBi, a two-stage partitioning framework that partitions a bipartite graph first vertically and then horizontally. The first partitioning stage divides each vectored vertex into multiple vertex-chunks such that the bipartite graph is vertically partitioned into multiple layers, to strike an appropriate tradeoff between inter- and intra-vertex communication. In the second partitioning stage, for each layer, the vertex-chunks in the larger vertex-subset are first assigned to nodes, to minimize vertex replicas. To be specific, these vertex-chunks are horizontally decomposed into one or more sub-chunks with an upper-bounded number of edges, and then the sub-chunks are evenly assigned over nodes of a distributed system using a set of hash functions, to achieve workload balance among all computing nodes. GraBi is a lightweight partitioning framework for bipartite graphs, and can be generalizable to most MLDM applications. Our evaluation, driven by real-world bipartite graphs processed in an 8-node cluster, shows that GraBi significantly improves the partitioning quality for bipartite graphs. Particularly, it decreases the computation time of MLDM algorithms by up to 5.41x, 4.32x, 1.89x over three state-of-the-art partitioning frameworks Hybrid-cut, Bi-cut, and 3D-partitioner respectively.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    23
    References
    0
    Citations
    NaN
    KQI
    []