IDLA: An Instruction-based Adaptive CNN Accelerator

2020 
In this paper, we propose an instruction-based adaptive CNN accelerator named IDLA for fast and efficient deployments of CNN models on FPGA. The hardware engine of IDLA accelerates the computation of CNN models by adaptively using different functional modules. Following a modular design fashion, the hardware engine is attentively designed to enable all these modules to work concurrently and to improve the usage efficiency of on-chip resources. Besides, layer fusion and weight reuse strategies are applied to reduce data access to DDR. Coordinating with this hardware engine, a network parser is developed to automatically analyze different CNN models to generate an optimal scheduling scheme for each CNN model. Moreover, a customized instruction set with moderate-granularity is designed to further enhance the flexibility in joint-optimization between software and hardware. We build the IDLA on a Xilinx VU9P FPGA. The experimental results show that our proposed IDLA accelerator has reached an overwhelming performance of 168.76 (ResNet18) and 277.63 (VGG16-SVD) GOPS with an DSP efficiency of 1.62 Ops/DSP/cycle (VGG16-SVD), much better than existing advanced works.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []