A Decomposition-Based Synthesis Algorithm for Sparse Matrix-Vector Multiplication in Parallel Communication Structure.

2021 
There is an obvious trend that hardware including many-core CPU, GPU and FPGA are always made use of to conduct computationally intensive tasks of deep learning implementations, a large proportion of which can be formulated into the format of sparse matrix-vector multiplication(SpMV). In contrast with dense matrix-vector multi-plication(DMV), scheduling solutions for SpMV targeting parallel processing turn out to be irregular, leading to the dilemma that the optimum synthesis problems are time-consuming or even infeasible, when the size of the involved matrix increases. In this paper, the minimum scheduling problem of 4x4 SpMV on ring-connected architecture is first studied, with two concepts named Multi-Input Vector and Multi-Output Vector introduced. The classification of 4x4 sparse matrices has been conducted, on account of which a decomposition-based synthesis algorithm for larger matrices is put forward. As the proposed method is guided by known sub-scheduling solutions, search space of the synthesis problem is considerably reduced. Through comparison with an exhaustive search method and a brute force-based parallel scheduling method, the proposed method is proved to be able to offer scheduling solutions of high-equality: averagely utilize 65.27% of the sparseness of the involved matrices and achieve 91.39% of the performance of the solutions generated by exhaustive search, with a remarkable saving of compilation time and the best scalability among the above-mentioned approaches.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    10
    References
    0
    Citations
    NaN
    KQI
    []