Systematically Understanding Graph Accelerator Dimensions and the Value of Hardware Flexibility

2022 
Because of the importance of graph workloads and the limitations of central processing units/graphics processing units (CPUs/GPUs), many graph-processing accelerators have been proposed. Most prior such accelerators adopt a single fixed algorithm. While helpful for specialization, this leaves performance potential from flexibility on the table and also complicates understanding the relationship between graph types, workloads, algorithms, and specialization. In this work, we explore the value of flexibility in graph-processing accelerators. Our approach is to identify a taxonomy of key algorithm variants, and develop a modular architecture, PolyGraph, which is flexible across them. The key to flexibility is our novel Taskflow execution model, which unifies task and dataflow parallelism. Overall, we find that flexibility is essential; PolyGraph outperforms similarly provisioned GPUs by mean 49.6× (up to 275×), and the best prior accelerator by mean 5.7×.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    10
    References
    0
    Citations
    NaN
    KQI
    []