A 9.0TOPS/W Hash-Based Deep Neural Network Accelerator Enabling 128× Model Compression in 10nm FinFET CMOS

2020 
A 10-nm DNN inference accelerator compresses model size with tabulation hash-based fine-grained weight sharing and increases 8b-compute density by $3.4\times $ to 1.6 TOPS/mm2. The compressed model DNN implements lightweight hashing circuits to compress fully connected and recurrent neural networks. Optimized shared weight address generation reduces MUX tree area overhead by 40%. Runtime hash table generation and weight mapping circuits enable a peak energy efficiency of 9.0 TOPS/W at 450 mV, 25 °C. A $128\times $ -compressed 3-layer long short-term memory classifies TIMIT phonemes with 85.6% accuracy for a total energy of 14 $\mu \text{J}$ /classification, with <0.5% degradation in accuracy over an uncompressed network.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    14
    References
    0
    Citations
    NaN
    KQI
    []