Batched Small Tensor-Matrix Multiplications on GPUs

2020 
We present a fine-tuned library, ZTMM, for batched small tensor-matrix multiplication on GPU architectures. Libraries performing optimized matrix-matrix multiplications involving large matrices are available for many architectures, including a GPU. However, these libraries do not provide optimal performance for applications requiring efficient multiplication of a matrix with a batch of small matrices or tensors. There has been recent interest in developing fine-tuned libraries for batched small matrix-matrix multiplication - these efforts are limited to square matrices. ZTMM supports both square and rectangular matrices. We experimentally demonstrate that our library has significantly higher performance than cuBLAS and Magma libraries. We demonstrate our library's use on a spectral element-based solver called CMT-nek that performs high-fidelity predictive simulations using compressible Navier-Stokes equations. CMT-nek involves three-dimensional tensors, but it is possible to apply the same techniques to higher dimensional tensors.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    21
    References
    0
    Citations
    NaN
    KQI
    []