A parallel sparse triangular solve algorithm based on dependency elimination of the solution vector

2020 
Sparse triangular solve (SpTRSV) is an important kernel in many scientific computing applications. In traditional viewpoints, accelerating SpTRSV by parallelizing the solution process is a challenging task. Dependencies among the variables that exist in the solution process not only restrict the parallelism that can be achieved, but also introduce large synchronization overhead among the parallel tasks. Moreover, a time-consuming pre-processing phase is commonly required to identify calculations that can be parallelized. However, we have observed that a large number of dependencies among the variables can be eliminated if we only calculate partial values of the variables first and add them together to obtain the final values later. By using such a strategy, starting to solve a variable does not need to wait for all of its prerequisite variables having been solved. In consequence, parallelism of the SpTRSV can be increased significantly. In this paper, we transform above mentioned observations into a subtree-based parallel algorithm to accelerate SpTRSV. The proposed algorithm calculates partial values of the variable along with an implicit subtree traversal and utilizes hardware atomic operation to implement accumulation of the partial values. This not only introduces no pre-processing overhead, but also avoids any barrier synchronization among the parallel threads. We evaluate the proposed algorithm on 2135 matrices from SuiteSparse Matrix Collection based on a generic GPU platform. Experimental results demonstrate that our scheme outperforms the state-of-the-art GPU and CPU vendor libraries in 1949 and 1782 matrices, respectively. Compared with the latest synchronization-free method, our scheme outperforms in 1779 matrices.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    27
    References
    1
    Citations
    NaN
    KQI
    []