Multiscale Co-Design Analysis of Energy, Latency, Area, and Accuracy of a ReRAM Analog Neural Training Accelerator

2018 
Neural networks are an increasingly attractive algorithm for natural language processing and pattern recognition. Deep networks with >50 M parameters are made possible by modern graphics processing unit clusters operating at $270\times $ energy and $540\times $ latency advantage over a similar block utilizing only digital ReRAM and takes only 11 fJ per multiply and accumulate. Compared with an SRAM-based accelerator, the energy is $430\times $ better and latency is $34\times $ better. Although training accuracy is degraded in the analog accelerator, several options to improve this are presented. The possible gains over a similar digital-only version of this accelerator block suggest that continued optimization of analog resistive memories is valuable. This detailed circuit and device analysis of a training accelerator may serve as a foundation for further architecture-level studies.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    30
    References
    37
    Citations
    NaN
    KQI
    []