Hardware accelerators for recurrent neural networks on FPGA

2017 
Recurrent Neural Networks (RNNs) have the ability to retain memory and learn from data sequences, which are fundamental for real-time applications. RNN computations offer limited data reuse, which leads to high data traffic. This translates into high off-chip memory bandwidth or large internal storage requirement to achieve high performance. Exploiting parallelism in RNN computations are bounded by this two limiting factors, among other constraints present in embedded systems. Therefore, balance between internally stored data and off-chip memory data transfer is necessary to overlap computation time with data transfer latency. In this paper, we present three hardware accelerators for RNN on Xilinx's Zynq SoC FPGA to present how to overcome challenges involved in developing RNN accelerators. Each design uses different strategies to achieve high performance and scalability. Each co-processor was tested with a character level language model. The latest design called DeepRnn, achieves up to 23 X better performance per power than Tegra X1 development board for this application.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    24
    References
    73
    Citations
    NaN
    KQI
    []