A 9.0TOPS/W Hash-Based Deep Neural Network Accelerator Enabling 128× Model Compression in 10nm FinFET CMOS

Raghavan Kumar,Gregory Chen,H. Ekin Sumbul,Phil Knag,Mark A. Anders,Himanshu Kaul,Steven K. Hsu,Amit Agarwal,Monodeep Kar,Seongjong Kim,Vikram B. Suresh,Ram Krishnamurthy,Vivek K. De,Sanu Mathew

A 9.0TOPS/W Hash-Based Deep Neural Network Accelerator Enabling 128× Model Compression in 10nm FinFET CMOS

2020

Raghavan Kumar
Gregory Chen
H. Ekin Sumbul
Phil Knag
Mark A. Anders
Himanshu Kaul
Steven K. Hsu
Amit Agarwal
Monodeep Kar
Seongjong Kim
Vikram B. Suresh
Ram Krishnamurthy
Vivek K. De
Sanu Mathew

A 10-nm DNN inference accelerator compresses model size with tabulation hash-based fine-grained weight sharing and increases 8b-compute density by $3.4\times $ to 1.6 TOPS/mm2. The compressed model DNN implements lightweight hashing circuits to compress fully connected and recurrent neural networks. Optimized shared weight address generation reduces MUX tree area overhead by 40%. Runtime hash table generation and weight mapping circuits enable a peak energy efficiency of 9.0 TOPS/W at 450 mV, 25 °C. A $128\times $ -compressed 3-layer long short-term memory classifies TIMIT phonemes with 85.6% accuracy for a total energy of 14 $\mu \text{J}$ /classification, with <0.5% degradation in accuracy over an uncompressed network.

Keywords:

model compression
Artificial neural network
Hash function
Computer hardware
CMOS
Computer science

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations