Multiscale Co-Design Analysis of Energy, Latency, Area, and Accuracy of a ReRAM Analog Neural Training Accelerator
2018
Neural networks are an increasingly attractive algorithm for natural language processing and pattern recognition. Deep networks with >50 M parameters are made possible by modern graphics processing unit clusters operating at $270\times $ energy and $540\times $ latency advantage over a similar block utilizing only digital ReRAM and takes only 11 fJ per multiply and accumulate. Compared with an SRAM-based accelerator, the energy is $430\times $ better and latency is $34\times $ better. Although training accuracy is degraded in the analog accelerator, several options to improve this are presented. The possible gains over a similar digital-only version of this accelerator block suggest that continued optimization of analog resistive memories is valuable. This detailed circuit and device analysis of a training accelerator may serve as a foundation for further architecture-level studies.
Keywords:
- Correction
- Source
- Cite
- Save
- Machine Reading By IdeaReader
30
References
37
Citations
NaN
KQI