Vyasa: A High-Performance Vectorizing Compiler for Tensor Convolutions on the Xilinx AI Engine

Prasanth Chatarasi,Stephen Neuendorffer,Samuel Bayliss,Kees Vissers,Vivek Sarkar

Vyasa: A High-Performance Vectorizing Compiler for Tensor Convolutions on the Xilinx AI Engine

2020

Prasanth Chatarasi
Stephen Neuendorffer
Samuel Bayliss
Kees Vissers
Vivek Sarkar

Xilinx's AI Engine is a recent industry example of energy-efficient vector processing that includes novel support for 2D SIMD datapaths and shuffle interconnection network. The current approach to programming the AI Engine relies on a C/C++ API for vector intrinsics. While an advance over assembly-level programming, it requires the programmer to specify a number of low-level operations based on detailed knowledge of the hardware. To address these challenges, we introduce Vyasa, a new programming system that extends the Halide DSL compiler to automatically generate code for the AI Engine. We evaluated Vyasa on 36 CONV2D workloads, and achieved geometric means of 7.6 and 24.2 MACs/cycle for 32-bit and 16-bit operands (which represent 95.9% and 75.6% of the peak performance respectively).

Keywords:

Intrinsics
Computer science
Operand
Digital subscriber line
Parallel computing
SIMD
Compiler
Interconnection
Vector processor
Programmer

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations