Accelerating Backward Convolution of Convolutional Neural Networks on BWDSP

2018 
Convolutional neural network (CNN), a well-known deep learning architecture extended from articial neural network, has been extensively applied in many applications, which includes image recognition, text classification and robot vision, etc. Inparticular, quite a few inference accelerators have been proposed based on several embedded systems, such as FPGA, ASIC and DSP platforms as for their advantages of fast development round, reconfigurability and high performance, while have less attention to the backpropagation of CNN based on Edge/Embedded System. It is well known that backpropagation is very demanding on hardware resources, and the training platform should have large bandwidth and enough computing resources. That's why it is very strict with the performance requirements. As a result, we focus on the backward convolution of deep CNN in this paper, implement and optimize a deconvolutional alogrithm with loop unrolling, software-pipelined and data partition in multi-macro techniques based on a digital signal processing, combined with its(BWDSP) architecture and instruction features. The experimental results show that our method achieves a performance of 11.07GFLOPS with one core under 500MHz working clock frequency. Then we take VGGNet-19 as a case study to verify our method's effectiveness and compare the efficiency between the accelerator and CPU(Intel Celeron CPU 1005M(@1.90GHz)). Furthermore, we compare it to previous approaches in the final, it turns out that our implementation outperforms better than CPU and FPGA with equivalent computing resources and can be used to deal with scenes with continuous learning requirements.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    15
    References
    0
    Citations
    NaN
    KQI
    []