FPGA-based CNN inference accelerator synthesized from multi-threaded C software

Jin Hee Kim,Brett Grady,Ruolong Lian,Jason H. Anderson

FPGA-based CNN inference accelerator synthesized from multi-threaded C software

2017

Jin Hee Kim
Brett Grady
Ruolong Lian
Jason H. Anderson

A deep-learning inference accelerator is synthesized from a C-language software program parallelized with Pthreads. The software implementation uses the well-known producer/consumer model with parallel threads interconnected by FIFO queues. The LegUp high-level synthesis (HLS) [1] tool synthesizes threads into parallel FPGA hardware, translating software parallelism into spatial parallelism. A complete system is generated where convolution, pooling and padding are realized in the synthesized accelerator, with remaining tasks executing on an embedded ARM processor. The accelerator incorporates reduced precision, and a novel approach for zero-weight-skipping in convolution. On a mid-sized Intel Arria 10 SoC FPGA, peak performance on VGG-16 is 138 effective GOPS.

Keywords:

Instruction set
Convolution
Thread (computing)
Parallel computing
Padding
Field-programmable gate array
POSIX Threads
Software
Computer science
ARM architecture

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations