BitPruner: network pruning for bit-serial accelerators

2020 
Bit-serial architectures (BSAs) are becoming increasingly popular in low power neural network processor (NNP) design. However, the performance and efficiency of state-of-the-art BSA NNPs are heavily depending on the distribution of ineffectual weight-bits of the running neural network. To boost the efficiency of third-party BSA accelerators, this work presents Bit-Pruner, a software approach to learn BSA-favored neural networks without resorting to hardware modifications. The techniques proposed in this work not only progressively prune but also structure the non-zero bits in weights, so that the number of zero-bits in the model can be increased and also load-balanced to suit the architecture of the target BSA accelerators. According to our experiments on a set of representative neural networks, Bit-Pruner increases the bit-sparsity up to 94.4% with negligible accuracy degradation. When the bit-pruned models are deployed onto typical BSA accelerators, the average performance is 2.1X and 1.5X higher than the baselines running non-pruned and weight-pruned networks, respectively.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    15
    References
    0
    Citations
    NaN
    KQI
    []