An Optimization Toolchain Design of Deep Learning Deployment Based on Heterogeneous Computing Platform

2020 
Progress in co-processor acceleration has enabled fast and high-performance deployment of intensive computation applications such as deep learning algorithms. Recently, new heterogeneous cooperation patterns have been studied to produce further acceleration on discrete or large-scale computing systems. Hence, in this paper, we explore auto-tuning strategies for task scheduling in heterogeneous cooperation between kernel operators. The tuning outcomes are designed to work in harmony with the kernel operators optimized by the TVM compiler. To hook up with the mainstream frameworks, we build a toolchain based on the Heterogeneous System Architecture (HSA) and the ROCM platform for deep learning deployment. Compared with the original TVM and official TensorFlow on the ROCM platform, our work achieves a higher inference speed.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    10
    References
    0
    Citations
    NaN
    KQI
    []