Matched Filtering Accelerated by Tensor Cores on Volta GPUs with Improved Accuracy using Half Precision Variables

2019 
Matched Filtering can be applied to various fields owing to its ability to compute a correlation coefficient of two vectors and detect many template events. With an improvement in observation techniques, massive observation data and templates have been accumulated, in which a reduction of computation cost of Matched Filtering has become an important issue. This computation is mainly matrix-matrix product and Tensor Core on NVIDIA Volta GPU is expected to compute it rapidly. However, actual performance of Tensor Core is usually limited by the bandwidth of shared memory or global memory. In addition, only lower-precision data types are supported in the current API for Tensor Core. Therefore, we have to prevent a decline in accuracy in the computation. In this letter, we designed a Matched Filtering algorithm to solve these problems mentioned above and utilized high arithmetic capacity on Tensor Core. Specifically, we reduced the number of memory access to global memory and shared memory by using low-level description. In addition, we introduced local normalization to reduce the numerical error. We applied our developed kernel to template matching of seismic observation data and compared the performance and the accuracy with cuBLAS, a common library in GPU computation. When we compared the performance with the function in cuBLAS that offered almost the same accuracy as our kernel, we reduced the elapsed time by a factor of 4.74.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    1
    Citations
    NaN
    KQI
    []