Communication-Avoiding and Memory-Constrained Sparse Matrix-Matrix Multiplication at Extreme Scale

2021 
We present a distributed-memory algorithm for sparse matrix-matrix multiplication (SpGEMM) of extremely large matrices where the generated output is larger than the aggregated memory of a target supercomputer. We address this challenge by splitting the computation into batches with each batch generating a set of output columns. We developed a distributed symbolic step to understand the memory requirement and determine the number of batches beforehand. We integrated the multiplication in each batch with an existing communication avoiding techniques to reduce the communication overhead while multiplying matrices in a 3-D process grid. Furthermore, we made the in-node computations faster by designing a sort-free SpGEMM and merging algorithm. Incorporating all the proposed approaches, our SpGEMM scales for large protein-similarity networks using 262,144 cores on a Cray XC40 supercomputer while achieving a 10x speedup using 16x more nodes. Our code is available as part of the Combinatorial BLAS library (https://github.com/PASSIONLab/CombBLAS).
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    33
    References
    1
    Citations
    NaN
    KQI
    []