A High Throughput Acceleration for Hybrid Neural Networks With Efficient Resource Management on FPGA

2019 
Deep learning is the amazing technology which has promoted the development of artificial intelligence and achieved many amazing successes in intelligent fields. Convolution-based layers (CLs), fully connected layers (FLs) and recurrent layers (RLs) are three types of layers in classic neural networks. Most intelligent tasks are implemented by the hybrid neural networks (hybrid-NNs), which are commonly composed of different layer-blocks (LBs) of CLs, FLs, and RLs. Because the CLs require the most computation in hybrid-NNs, many field-programmable gate array (FPGA)-based accelerators focus on CLs acceleration and have demonstrated great performance. However, the CLs accelerators lead to an underutilization of FPGA resources in the acceleration of the whole hybrid-NN. To fully exploit the logic resources and the memory bandwidth in the acceleration of CLs/FLs/RLs, we propose an FPGA resource efficient mapping mechanism for hybrid-NNs. The mechanism first improves the utilization of DSPs by integrating multiple small bit-width operations on one DSP. Then the LB-level spatial mapping is used to exploit the complementary features between different neural networks in the hybrid-NN. We evaluate the mapping mechanism by implementing four hybrid-NNs on Xilinx Virtex7 690T FPGA. The proposed mechanism achieves a peak performance of 1805.8 giga operations per second (GOPs). With the analysis on resource utilization and throughput, the proposed method exploits more computing power in FPGA and achieves up to $4.13 \times$ higher throughput than the state-of-the-art acceleration.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    25
    References
    20
    Citations
    NaN
    KQI
    []