An FPGA-based Fine Tuning Accelerator for a Sparse CNN

2019 
Fine-tuning learns abundant feature expression for a wide range of natural images by using a pre-trained CNN model. It can be applied to a wide range of the neural network (NN)based computer vision problems. This paper proposes an FPGA-based fine-tuning accelerator for a sparse convolutional neural network (CNN). The proposed architecture consists of sparse convolutional units and pooling units with distributed stacks those are suitable for a sparse CNN. Additionally, this paper presents a fine-tuning scheme, which loads a pre-trained sparse CNN to reduce the memory size for the training step. Thus, our fine-tuning scheme stores all parameters on BRAMs and Ultra RAMs in the case of the Xilinx Virtex UltraScale+ FPGA to accelerate the training computation and reduce power consumption by eliminating energy-costly DRAM accesses. We implemented on a Xilinx Virtex UltraScale+ VC1525 acceleration development kit. Experimental results show that the proposed sparse finetuning accelerator on the FPGA can achieve four times faster, 2.9 times lower power consumption, and 11.6 times better performance per power, compared to the existing NVidia GTX1080Ti GPU.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    7
    Citations
    NaN
    KQI
    []