Efficient Processing of Convolutional Neural Networks on SW26010

2019 
Artificial intelligence has developed rapidly in recent years. Deep neural networks are the basis of many artificial intelligence applications. How to accelerate the computational processing of deep neural networks is very important. To explor the potential for accelerating the process deep neural networks on various hardware platforms, we propose a convolutional neural network optimization method based on the Weight-Stationary for SW26010 processor. We re-circulate convolution loops and use hybrid DMA transmission mode to increase memory bandwidth and reduce memory access overhead. On top of those, further optimizations are done based on register communication, asynchronous DMA transfer double buffering, instruction scheduling and other schemes. Finally, we achieve a double-precision convolution performance over 2.4 Tflops, achieving 81% of the processor’s peak performance. In multiple parameters, we achieve a proforamnce acceleration of \(2.4-4.0\times \) speedup compared to the Tesla K80 GPU with cuDNNv7.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    5
    References
    0
    Citations
    NaN
    KQI
    []