Research on Deep Neural Network Model Compression Based on Quantification Pruning and Huffmann Encoding.

Cong Wei,Zhiyong Lu,Zhiyong Lin,Chong Zhong

Research on Deep Neural Network Model Compression Based on Quantification Pruning and Huffmann Encoding.

2021

With the rapid development of hardware GPU and the advent of the era of big data, neural networks have developed rapidly and greatly improved the recognition performance in various fields. The application of nerual networks to intelligent mobile embedded military equipment will become the wave of the development of a new generation of deep learning. However, neural networks are both computationally intensive and memory intensive, making them difficult to deploy on embedded systems with limited hardware resources. To address this limitation, we propose a new method called “deep compression”, which including three stages: pruning, quantification and Huffmann encoding, to reduce the storage requirement of neural networks without impacting original accuracy. Our method first prunes the network by learning only the important connections. Next, we quantize the weights to enforce weight sharing, finally, weapply Huffmann encoding. We evaluated our method on both MNIST and ImageNet. On the ImageNet dataset, our method reduced the storage of AlexNet by 35× without loss of accuracy and compressed VGG-16 model by 49×, also with no loss of accuracy. Our method is an efficient solution for real-time multi-objective recognition based on lightweight deep neural networks.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations