An Efficient Channel-Aware Sparse Binarized Neural Networks inference Accelerator

2021 
The binarized neural network (BNN) inference accelerators show great promise in cost-and power-restricted domains. However, the performances of these accelerators are still severely limited by the significant redundancies in BNNs inference. In this brief, we introduce channel-aware sparse accelerator (CAA) to alleviate the performance degradations induced by the redundancies in BNNs while maintaining original accuracies. First, motivated by the observation that the convolution processes of our rebuilt rectangle kernels contain many redundant operations which can be skipped by exploiting the BNN-specific property, we convert the entire original XNOR-popcount convolutions of each neuron into channel-aware-popcount (CAP) operations for all binarized convolutional and fully-connected layers in CAA by employing rectangle kernel simplification strategy and eliminate the unnecessary operations. Meanwhile, these CAP operations can be implemented to directly gain the final output without any extra steps. Furthermore, inspired by our new observations on two specific kinds of properties of the CAP operations, we adopt group pruning approach to save the remaining redundant CAP operations. Experimental results show that our design evaluated on an embedded FPGA achieves 4.2-6.6× inference-speedup, 3.6-5.5× energy-efficiency enhancement, and 1.35× resource-efficiency improvement compared with state-of-the-art works.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []