TCP-Net: Minimizing Operation Counts of Binarized Neural Network Inference

2021 
Binarized neural network (BNN) dataflow inference accelerators have emerged as a promising solution to be applied in cost- and power-restricted domains, such as loT and smart edge-devices. However, there still exists abundant redundancies in BNN inference, which severely limit the performances of these accelerators. To alleviate the performance degradation, we propose TCP-Net, an efficient architecture to minimize the number of operations in BNN inference while maintaining the original accuracy. Inspired by the observation that the processes of obtaining the outputs of multiple related kernels in BNNs contain significant repeated calculations, we first build a formula to bridge these outputs by utilizing the kernel inclusion similarity and eliminate the unnecessary operations. Through the recursive algorithm, we further convert each original XNOR-popcount convolution into threshold-comparable-popcount (TCP) operations, which can be implemented to directly achieve the final output without any extra steps. Furthermore, we reduce the remaining TCP operation counts by exploiting tile-based pruning strategy. Compared to the prior state-of-the-art designs, TCP-Net saves 79.11 percent of the operations without any accuracy loss, bringing 6.0× inference-speedup and 12.4× energy-efficiency improvement.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    16
    References
    0
    Citations
    NaN
    KQI
    []