ATT: A Fault-Tolerant ReRAM Accelerator for Attention-based Neural Networks

2020 
Crossbar-based resistive RAM has been widely used in deep learning accelerator designs because it largely eliminates weight movement between memory and processing units. The high-density storage and low leakage power make it a good fit for edge/IoT devices. However, existing ReRAM designs for traditional neural networks cannot support Attention-based Neural Networks, which are stacked with encoders and decoders instead of convolutional layers or fully connected layers. In addition to matrix-matrix multiplications in traditional neural networks, an encoder or a decoder also includes the attention mechanism, the layer normalization and the gaussian error linear unit. These new characteristics make the data flow far more complicated than that of a convolutional layer. Faulty ReRAM devices are additional obstacles when mapping weights that severely degrade computation accuracy. Existing hardware redundancy strategies that are unaware of application characteristics usually result in inefficient designs. In this work, we analyze the data flow of these attention-based neural networks and propose a ReRAM-based accelerator with a dedicated pipeline design for Attention-based Neural Networks. When considering cells with hard faults in crossbars, we further propose NuXG, a non-uniform redundancy strategy, to meet accuracy requirements and save energy consumption by decreasing the redundancy ratio. Finally, we evaluate results and demonstrate that the proposed can achieve more than two times improved performance over existing redundancy schemes in both power efficiency and throughput for Attention-based Neural Networks. Moreover, it also significantly outperforms an NVIDIA GPU.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    40
    References
    1
    Citations
    NaN
    KQI
    []