Characterizing Performance of Graph Neighborhood Communication Patterns

2021 
Distributed-memory graph algorithms are fundamental enablers in scientific computing and analytics workflows. A majority of graph algorithms rely on the graph neighborhood communication pattern, i.e., repeated asynchronous communication between a vertex and its neighbors in the graph. The pattern is adversarial for communication software and hardware due to high message injection rates and input-dependent, many-to-one traffic with variable destinations and volumes. We present benchmarks and performance analysis of graph neighborhood communication on modern large-scale network interconnects from four supercomputers: ALCF Theta, NERSC Cori, OLCF Summit and R-CCS Fugaku. Our benchmarks characterize communication from the perspectives of latency and throughput. Benchmark parameters make it possible to mimic the behaviors of complex applications on real world or synthetic graphs by varying work distribution, remote edges, message volume, and per-vertex work. We find that minor changes in the input graph can substantially increase latencies; and contention can develop in memory caches and network stacks before contention in the network itself. Further, latencies and contention vary significantly for different graph neighborhoods, motivating the need for exploring asynchronous algorithms in greater detail. When adding work, load imbalance on real-world graphs can be pronounced: latencies for the 99th percentile were 8–128× than the corresponding average latencies. Our results help analysts and developers understand the performance implications of this important pattern, especially for the impending exascale platforms.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    42
    References
    1
    Citations
    NaN
    KQI
    []