Dual-Precision Acceleration of Convolutional Neural Network Computation with Mixed Input and Output Data Reuse

2019 
Memory access dominates power consumption in hardware acceleration of deep neural networks (DNN) computation due to the movement of huge data and weights. This paper design a DNN accelerator using mixed input and output data reuse scheme to achieve balance between internal memory size and memory access amount, two contradictory design goals in resource limited embedded systems. First, analytical forms for memory size and accesses are derived for different data reuse methods in DNN convolution. After comparing the analysis results across different convolutional layers of the VGG-16 model with different levels of hardware parallelism, we implement a low-cost DNN hardware accelerator using mixed input and output data reuse scheme with 32 processing elements (PEs) operating in parallel. Furthermore, the design supports two precision modes (8-bit and 16-bit) allowing variable precision requirements across DNN layers, resulting in more efficient computation compared with single-precision designs through sharing of hardware resource.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    6
    References
    0
    Citations
    NaN
    KQI
    []