A 2.9-33.0 TOPS/W Reconfigurable 1D/2D Compute-Near-Memory Inference Accelerator in 10nm FinFET CMOS
2020
A 10-nm compute-near-memory (CNM) accelerator augments SRAM with multiply accumulate (MAC) units to reduce interconnect energy and achieve 2.9 8b-TOPS/W for matrix–vector computation. The CNM provides high memory bandwidth by accessing SRAM subarrays to enable low-latency, real-time inference in fully connected and recurrent neural networks with small mini-batch sizes. For workloads with greater arithmetic intensity, such as large-batch convolutional neural networks, the CNM reconfigures into a 2-D systolic array to amortize memory access energy over a greater number of computations. Variable-precision 8b/4b/2b/1b MACs increase throughput by up to $8\times $ for binary operations at 33.0 1b-TOPS/W.
- Correction
- Source
- Cite
- Save
- Machine Reading By IdeaReader
10
References
1
Citations
NaN
KQI