Rate-Distortion Optimized Coding for Efficient CNN Compression

2021 
In this paper, we present a coding framework for deep convolutional neural network compression. Our approach utilizes the classical coding theories and formulates the compression of deep convolutional neural networks as a rate-distortion optimization problem. We incorporate three coding ingredients in the coding framework, including bit allocation, dead zone quantization, and Tunstall coding, to improve the rate-distortion frontier without noticeable system-level overhead introduced. Experimental results show that our approach achieves state-of-the-art results on various deep convolutional neural networks and obtains considerable speedup on two deep learning accelerators. Specifically, our approach achieves 20× compression ratio on ResNet-18, ResNet-34, and ResNet-50, and 10× compression ratio on the compact already model MobileNet-v2, without hurting the accuracy. We then examine the system level impact of our approach when deploying the compressed models to hardware platforms. Hardware simulation results show that our approach obtains up to 4.3× and 2.8× inference speedup on state-of-the-art deep learning accelerators TPU and Eyeriss, respectively.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    21
    References
    1
    Citations
    NaN
    KQI
    []