Layer-by-layer Quantization Method for Neural Network Parameters

2019 
Limited by the storage and computing power of mobile devices, the deployment of neural networks on mobile devices is slow. Quantifying the parameters in the neural network not only reduces the storage required by the network, but also simplifies the design of the arithmetic unit. This facilitates the application of neural networks in mobile devices. This paper proposes a novel parameter quantization method, which quantifies the weight data and output data of the network layer by layer to achieve the purpose of model compression and achieve a good balance between model size and precision. The method in this paper achieves a 7.62 times model compression on the MNIST dataset with a precision loss of only 0.13%.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    16
    References
    1
    Citations
    NaN
    KQI
    []