Efficient Tunstall Decoder for Deep Neural Network Compression

2021 
Power and area-efficient deep neural network (DNN) designs are key in edge applications. Compact DNNs, via compression or quantization, enable such designs by significantly reducing memory footprint. Lossless entropy coding can further reduce the size of networks. It is then critical to provide hardware support for such entropy coding module to fully benefit from the resulted reduced memory requirement. In this work, we introduce Tunstall coding to compress the quantized weights. Tunstall coding can achieve high compression ratio as well as very fast decoding speed on various deep networks. We present two hardware-accelerated decoding techniques that provide streamlined decoding capabilities. We synthesize these designs targeting on FPGA. Results show that we achieve up to 6× faster decoding time versus state-of-the-art decoding methods.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    20
    References
    0
    Citations
    NaN
    KQI
    []