An Optimization Toolchain Design of Deep Learning Deployment Based on Heterogeneous Computing Platform

Jun Yin,Jun Han,Xiaodong Zhang

An Optimization Toolchain Design of Deep Learning Deployment Based on Heterogeneous Computing Platform

2020

Progress in co-processor acceleration has enabled fast and high-performance deployment of intensive computation applications such as deep learning algorithms. Recently, new heterogeneous cooperation patterns have been studied to produce further acceleration on discrete or large-scale computing systems. Hence, in this paper, we explore auto-tuning strategies for task scheduling in heterogeneous cooperation between kernel operators. The tuning outcomes are designed to work in harmony with the kernel operators optimized by the TVM compiler. To hook up with the mainstream frameworks, we build a toolchain based on the Heterogeneous System Architecture (HSA) and the ROCM platform for deep learning deployment. Compared with the original TVM and official TensorFlow on the ROCM platform, our work achieves a higher inference speed.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations