FPGA accelerator for CNN: an exploration of the kernel structured sparsity and hybrid arithmetic computation

Guanwen Zhang,Song Zhou,Zhemin Duan,Wei Zhou

FPGA accelerator for CNN: an exploration of the kernel structured sparsity and hybrid arithmetic computation

2021

The deployment of large-scale deep neural networks on field programmable gate array (FPGA) platforms is severely hindered by the high requirements on computational resources and off-chip data bandwidth. Traditional nonstructured sparsity algorithms can efficiently reduce the nonzero weights of neural network models. However, the nonstructured sparse connections across channels also degrade the degree of computational parallelism and consequently seriously deteriorate the performance of the FPGA accelerator. We propose an FPGA accelerator by exploring the kernel structured sparsity and hybrid arithmetic computation for the convolutional neural network (CNN). On the one hand, we introduce a hardware-friendly kernel pruning method to reduce the number of arithmetic operations of the CNN model. Our proposed method maintains high accuracy (achieving a less than 0.32% accuracy loss) and achieves a high degree of parallelism. On the other hand, we design a specific hybrid arithmetic computation for the FPGA accelerator to speed up the performance of the pruned CNN model. The FPGA accelerator consists of only 64 sets of hybrid 8-bit and 16-bit floating-point units for the convolution operation. Experiments on VGGNet16 demonstrate that the proposed FPGA accelerator achieves a state-of-the-art 5 × convolution operation reduction and a 3 × parameter compression. The proposed FPGA accelerator is able to perform at 13.2 FPS, and the corresponding energy efficiency can be boosted up to 1.9 image / J.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations