SWIS -- Shared Weight bIt Sparsity for Efficient Neural Network Acceleration.

Shurui Li,W. Romaszkan,Alexander Graening,Puneet Gupta

SWIS -- Shared Weight bIt Sparsity for Efficient Neural Network Acceleration.

2021

Shurui Li
W. Romaszkan
Alexander Graening
Puneet Gupta

Quantization is spearheading the increase in performance and efficiency of neural network computing systems making headway into commodity hardware. We present SWIS - Shared Weight bIt Sparsity, a quantization framework for efficient neural network inference acceleration delivering improved performance and storage compression through an offline weight decomposition and scheduling algorithm. SWIS can achieve up to 54.3% (19.8%) point accuracy improvement compared to weight truncation when quantizing MobileNet-v2 to 4 (2) bits post-training (with retraining) showing the strength of leveraging shared bit-sparsity in weights. SWIS accelerator gives up to 6x speedup and 1.9x energy improvement overstate of the art bit-serial architectures.

Keywords:

Quantization (signal processing)
Artificial neural network
Computer science
Inference
Acceleration
Speedup
Truncation
Headway
Scheduling (computing)
Computer engineering

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations