Kernel-controlled DQN based CNN Pruning for Model Compression and Acceleration.

2020 
Apart from the accuracy, the size of Convolutional Neural Networks (CNN) model is another principal factor for facilitating the deployment of models on memory, power and budget constrained devices. Conventional compression techniques require human expert to setup parameters to explore the design space and iterative based pruning requires heavy training which is sub-optimal and time consuming. Given a CNN model, we propose deep reinforcement learning [8] DQN based automated compression which effectively turned off kernels on each layer by observing its significance. Observing accuracy, compression ratio and convergence rate, proposed DQN model can automatically re- activate the healthiest kernels back to train it again to regain accuracy which greatly ameliorate the model compression quality. Based on experiments on MNIST [3] dataset, our method can compress convolution layers for VGG-like [10] model up to 60% with 0.5% increase in test accuracy within less than a half the number of initial amount of training (speed-up up to 2.5×), state- of-the-art results of dropping 80% of kernels (compressed 86% parameters) with increase in accuracy by 0.14%. Further dropping 84% of kernels (compressed 94% parameters) with the loss of 0.4% accuracy. The first proposed Auto-AEC (Accuracy-Ensured Compression) model can compress the network by preserving original accuracy or increase in accuracy of the model, whereas, the second proposed Auto-CECA (Compression-Ensured Considering the Accuracy) model can compress to the maximum by preserving original accuracy or minimal drop of accuracy. We further analyze effectiveness of kernels on different layers based on how our model explores and exploits in various stages of training.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    10
    References
    0
    Citations
    NaN
    KQI
    []