Towards automated kernel fusion for the optimisation of scientific applications

2020 
In this paper we introduce a novel transformation pass written using LLVM that performs kernel fusion. We demonstrate the correctness and performance of the pass on several example programs inspired by scientific applications of interest. The method achieves up to 4× speedup relative to unfused versions of the programs, and exact performance parity with manually fused versions. In contrast to previous work, it also requires minimal user intervention. Our approach is facilitated by a new loop fusion algorithm capable of interprocedurally fusing both skewed and unskewed loops in different kernels.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    22
    References
    0
    Citations
    NaN
    KQI
    []