Highly Efficient Breadth-First Search on CPU-Based Single-Node System

2019 
With the rapid development of computer technology, the scale of graph increases explosively and large-scale graph computing has been the focus in recent years. Breadth-First Search (BFS) is one of the most important kernels in graph computing. It is the main kernel of the Graph500 benchmark that evaluates the performance of supercomputers and servers in terms of data-intensive applications. In this paper, we implement a highly efficient parallel BFS on CPU-based single-node systems. We propose two algorithm optimization techniques: round-robin vertex shuffle and jumping access. The former can improve the load balance for parallel implementation, while the latter can reduce unnecessary data accesses in the bottom-up algorithm. Our new implementation demonstrates strong scalability and achieves 65.2 GTEPS for Kronecker graph with 2^30 vertices and 2^34 edges on a 4-way Xeon server. The speedup is nearly 12.8 times faster than the original direction-optimizing algorithm. In terms of energy efficiency, the result on our self-designed ARM server is 254.6 MTEPS/W and is superior to conventional x86 server. To our best knowledge, this is the first work that evaluates BFS performance on ARM server and ARM server is suitable for data-intensive applications such as large-scale graph processing.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    2
    Citations
    NaN
    KQI
    []