A 2.9-33.0 TOPS/W Reconfigurable 1D/2D Compute-Near-Memory Inference Accelerator in 10nm FinFET CMOS

2020 
A 10-nm compute-near-memory (CNM) accelerator augments SRAM with multiply accumulate (MAC) units to reduce interconnect energy and achieve 2.9 8b-TOPS/W for matrix–vector computation. The CNM provides high memory bandwidth by accessing SRAM subarrays to enable low-latency, real-time inference in fully connected and recurrent neural networks with small mini-batch sizes. For workloads with greater arithmetic intensity, such as large-batch convolutional neural networks, the CNM reconfigures into a 2-D systolic array to amortize memory access energy over a greater number of computations. Variable-precision 8b/4b/2b/1b MACs increase throughput by up to $8\times $ for binary operations at 33.0 1b-TOPS/W.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    10
    References
    1
    Citations
    NaN
    KQI
    []