A High-Throughput Detection Circuit based on 2 q +1-Valued Deep Neural Networks

Naoto Soga,Ryosuke Kuramochi,Hiroki Nakahara

A High-Throughput Detection Circuit based on 2 q +1-Valued Deep Neural Networks

2021

The demands of applications using a high-speed deep learning models at data centers are rapidly increasing. However, most of these accelerators depend on many memory accesses and DSP blocks, which cause performance bottleneck. We present a lookup table (LUT) mapping to directly map convolutional layers, mainly used in modern deep learning models. To reduce the number of LUTs, we develop a training method for a sparse local convolution (SLC), which trains sparse convolutional layers with unshared weight kernels with 2q + 1-valued representation to eliminate a zero weight edge. Compared with conventional sparse CNN training methods, 88% of multiply-accumulate operations are reduced by SLC training while maintaining the same accuracy. We implement an LUT-based convolutional layer circuit with 105 to 106 LUTs, accommodated by data center FPGAs and operating at a high-speed at 500 MHz (500 MFPS).

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations