A Hybrid Precision Low Power Computing-In-Memory Architecture for Neural Networks

2020 
Abstract Recently, non-volatile memory-based computing-in-memory has been regarded as a promising competitor to ultra-low-power AI chips. Implementations based on both binarized (BIN) and multi-bit (MB) schemes are proposed for DNNs/CNNs. However, there are challenges in accuracy and power efficiency in the practical use of both schemes. This paper proposes a hybrid precision architecture and circuit-level techniques to overcome these challenges. According to measured experimental results, a test chip based on the proposed architecture achieves (1) from binarized weights and inputs up to 8-bit input, 5-bit weight, and 7-bit output, (2) an accuracy loss reduction of from 86% to 96% for multiple complex CNNs, and (3) a power efficiency of 2.15TOPS/W based on a 0.22μm CMOS process which greatly reduces costs compared to digital designs with similar power efficiency. With a more advanced process, the architecture can achieve a higher power efficiency. According to our estimation, a power efficiency of over 20TOPS/W can be achieved with a 55nm CMOS process.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    21
    References
    0
    Citations
    NaN
    KQI
    []