A 2.9-33.0 TOPS/W Reconfigurable 1D/2D Compute-Near-Memory Inference Accelerator in 10nm FinFET CMOS

H. Ekin Sumbul,Gregory K. Chen,Phil Knag,Raghavan Kumar,Mark A. Anders,Himanshu Kaul,Steven K. Hsu,Amit Agarwal,Monodeep Kar,Seongjong Kim,Ram Krishnamurthy

A 2.9-33.0 TOPS/W Reconfigurable 1D/2D Compute-Near-Memory Inference Accelerator in 10nm FinFET CMOS

2020

H. Ekin Sumbul
Gregory K. Chen
Phil Knag
Raghavan Kumar
Mark A. Anders
Himanshu Kaul
Steven K. Hsu
Amit Agarwal
Monodeep Kar
Seongjong Kim
Ram Krishnamurthy

A 10-nm compute-near-memory (CNM) accelerator augments SRAM with multiply accumulate (MAC) units to reduce interconnect energy and achieve 2.9 8b-TOPS/W for matrix–vector computation. The CNM provides high memory bandwidth by accessing SRAM subarrays to enable low-latency, real-time inference in fully connected and recurrent neural networks with small mini-batch sizes. For workloads with greater arithmetic intensity, such as large-batch convolutional neural networks, the CNM reconfigures into a 2-D systolic array to amortize memory access energy over a greater number of computations. Variable-precision 8b/4b/2b/1b MACs increase throughput by up to $8\times $ for binary operations at 33.0 1b-TOPS/W.

Keywords:

TOPS
Computational science
CMOS
Inference
Computer science

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations