Sparse Low Rank Factorization for Deep Neural Network Compression

2020 
Abstract Storing and processing millions of parameters in Deep Neural Networks is highly challenging during the deployment of model in real-time application on resource constrained devices. Popular low-rank approximation approach Singular Value Decomposition (SVD) is generally applied to the weights of fully connected layers where compact storage is achieved by keeping only the most prominent components of the decomposed matrices. Years of research on pruning-based neural network model compression revealed that the relative importance or contribution of each neuron in a layer highly vary among each other. Recently, synapses pruning has also demonstrated that having sparse matrices in network architecture achieve lower space and faster computation during inference time. We extend these arguments by proposing that the low-rank decomposition of weight matrices should also consider significance of both input as well as output neurons of a layer. Combining the ideas of sparsity and existence of unequal contributions of neurons towards achieving the target, we propose Sparse Low Rank (SLR) method which sparsifies SVD matrices to achieve better compression rate by keeping lower rank for unimportant neurons. We demonstrate the effectiveness of our method in compressing famous CNN image recognition frameworks which are trained on popular datasets. Experimental results show that the proposed approach SLR outperforms vanilla truncated SVD and a pruning baseline, achieving better compression rates with minimal or no loss in the accuracy. We provide code of the proposed as well as comparative approaches, and pretrained models for review at https://github.com/slr-code which will be made publicly available after the publication of manuscript.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    73
    References
    13
    Citations
    NaN
    KQI
    []