Layer-by-layer Quantization Method for Neural Network Parameters
2019
Limited by the storage and computing power of mobile devices, the deployment of neural networks on mobile devices is slow. Quantifying the parameters in the neural network not only reduces the storage required by the network, but also simplifies the design of the arithmetic unit. This facilitates the application of neural networks in mobile devices. This paper proposes a novel parameter quantization method, which quantifies the weight data and output data of the network layer by layer to achieve the purpose of model compression and achieve a good balance between model size and precision. The method in this paper achieves a 7.62 times model compression on the MNIST dataset with a precision loss of only 0.13%.
Keywords:
- Correction
- Source
- Cite
- Save
- Machine Reading By IdeaReader
16
References
1
Citations
NaN
KQI