VPC: Pruning connected components using vector-based path compression for Graph500

2021 
Graphs are an effective approach for data representation and organization, and graph analysis is a promising killer application for AI systems. However, recently emerging extremely large graphs (consisting of trillions of vertices and edges) exceed the capacity of any small-/medium-scale clusters and thus necessitate the adoption of supercomputers for efficient graph processing. Graph500 is the de facto standard for benchmarking supercomputers’ graph processing performance, and connected component (CC) is an important basic algorithm for Graph500’s BFS and SSSP tests. However, current CC algorithms are inefficient on supercomputers and fast CC is expensive and challenging. In this paper, we propose VPC, an efficient method that prunes connected components using vector-based path compression. It includes the following innovations: (i) The data structure of the traversal algorithm is customized with the two-dimensional adjacency vector. (ii) The vector-based path compression is proposed for the union-find algorithm. (iii) Parallel VPC is proposed customized with Tianhe. Experimental results validate that the two-dimensional adjacency vector has better performance than other data structures and the vector-based path compression is used in the realization of the union-find algorithm. When the scale is 26, the performance of our algorithm is 1.38 $$\times$$ , 1.69 $$\times$$ and 2.57 $$\times$$ that of other algorithms. The union-find algorithm is used for connected components, and the performance of the algorithm is 5.14 $$\times$$ and 5.01 $$\times$$ that of BFS and DFS respectively.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    51
    References
    0
    Citations
    NaN
    KQI
    []