A 12.1 TOPS/W Mixed-precision Quantized Deep Convolutional Neural Network Accelerator for Low Power on Edge / Endpoint Device

2020 
This work proposes a Mixed-precision quantized deep Convolutional Neural Network Accelerator which supports mixed precision like Binary, Ternary and Integer 8-bits (INT8) for different layers. There are some layers having precision redundancy and they can be quantized to lower bits like binary or ternary. On the other hand, some layers which are sensitive to bit precision should be quantized to INT8. In case of the object detection with the Tiny-Yolov3, this accelerator realizes 12.1 TOPS/W energy efficiency by quantizing neural network to mixed precision while maintaining recognition accuracy. Furthermore, Deep neural network computation requires the use of many weight data and input data. So, memory access efficiency is very important for limited memory bandwidth on SoC (System-on-Chip). It achieved access efficiency over 90% on the DRAM bandwidth by arranging data placement on the memory to scan them efficiently. Then it achieved both low power consumption and high performance while maintaining recognition accuracy equivalent to Floating Point 32(FP32) in many neural networks.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    7
    References
    0
    Citations
    NaN
    KQI
    []