BitPruner: network pruning for bit-serial accelerators
2020
Bit-serial architectures (BSAs) are becoming increasingly popular in low power neural network processor (NNP) design. However, the performance and efficiency of state-of-the-art BSA NNPs are heavily depending on the distribution of ineffectual weight-bits of the running neural network. To boost the efficiency of third-party BSA accelerators, this work presents Bit-Pruner, a software approach to learn BSA-favored neural networks without resorting to hardware modifications. The techniques proposed in this work not only progressively prune but also structure the non-zero bits in weights, so that the number of zero-bits in the model can be increased and also load-balanced to suit the architecture of the target BSA accelerators. According to our experiments on a set of representative neural networks, Bit-Pruner increases the bit-sparsity up to 94.4% with negligible accuracy degradation. When the bit-pruned models are deployed onto typical BSA accelerators, the average performance is 2.1X and 1.5X higher than the baselines running non-pruned and weight-pruned networks, respectively.
- Correction
- Source
- Cite
- Save
- Machine Reading By IdeaReader
15
References
0
Citations
NaN
KQI