A 9.0TOPS/W Hash-Based Deep Neural Network Accelerator Enabling 128× Model Compression in 10nm FinFET CMOS
2020
A 10-nm DNN inference accelerator compresses model size with tabulation hash-based fine-grained weight sharing and increases 8b-compute density by $3.4\times $ to 1.6 TOPS/mm2. The compressed model DNN implements lightweight hashing circuits to compress fully connected and recurrent neural networks. Optimized shared weight address generation reduces MUX tree area overhead by 40%. Runtime hash table generation and weight mapping circuits enable a peak energy efficiency of 9.0 TOPS/W at 450 mV, 25 °C. A $128\times $ -compressed 3-layer long short-term memory classifies TIMIT phonemes with 85.6% accuracy for a total energy of 14 $\mu \text{J}$ /classification, with <0.5% degradation in accuracy over an uncompressed network.
Keywords:
- Correction
- Source
- Cite
- Save
- Machine Reading By IdeaReader
14
References
0
Citations
NaN
KQI