Cache Compression with Golomb-Rice Code and Quantization for Convolutional Neural Networks

2021 
Cache compression schemes reduce the cache miss rate by increasing the effective cache capacity and consequently, reduce memory access and power consumption. Therefore, cache compression is beneficial for applications with heavy memory traffic, including convolutional neural network (CNN). In this paper, a new cache compression of a floatingpoint number is proposed for CNNs. The exponent is compressed using the Golomb-Rice code, instead of the Huffman code, for an efficient hardware implementation. The compression syntax is carefully designed so that the size of compressed data is not very far from the entropy, which is the theoretical limit, by distinguishing two different types of data used in CNNs. On the other hand, since the mantissa of CNNs data can be hardly compressed by entropy coding, it is simply quantized for data reduction that may not degrade the CNN performance significantly thanks to the error robustness of CNNs. The quantization reduces 23 bits of a mantissa to 4 bits. The experimental results show that the miss rate of a 1 MB compressed cache with the proposed compression method applied is almost similar to that of an uncompressed 2 MB cache without any decrease of the CNN accuracy.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    21
    References
    0
    Citations
    NaN
    KQI
    []