A 12.1 TOPS/W Mixed-precision Quantized Deep Convolutional Neural Network Accelerator for Low Power on Edge / Endpoint Device

Takanori Isono,Makoto Yamakura,Satoshi Shimaya,Isao Kawamoto,Nobuhiro Tsuboi,Masaaki Mineo,Wataru Nakajima,Kenichi Ishida,Shin Sasaki,Toshio Higuchi,Masahiro Hoshaku,Daisuke Murakami,Toshifumi Iwasaki,Hiroshi Hirai

A 12.1 TOPS/W Mixed-precision Quantized Deep Convolutional Neural Network Accelerator for Low Power on Edge / Endpoint Device

2020

This work proposes a Mixed-precision quantized deep Convolutional Neural Network Accelerator which supports mixed precision like Binary, Ternary and Integer 8-bits (INT8) for different layers. There are some layers having precision redundancy and they can be quantized to lower bits like binary or ternary. On the other hand, some layers which are sensitive to bit precision should be quantized to INT8. In case of the object detection with the Tiny-Yolov3, this accelerator realizes 12.1 TOPS/W energy efficiency by quantizing neural network to mixed precision while maintaining recognition accuracy. Furthermore, Deep neural network computation requires the use of many weight data and input data. So, memory access efficiency is very important for limited memory bandwidth on SoC (System-on-Chip). It achieved access efficiency over 90% on the DRAM bandwidth by arranging data placement on the memory to scan them efficiently. Then it achieved both low power consumption and high performance while maintaining recognition accuracy equivalent to Floating Point 32(FP32) in many neural networks.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations