Data-Aware Compression of Neural Networks

2021 
Deep Neural networks (DNNs) are getting deeper and larger which intensify the data movement and compute demands. Prior work focuses on reducing data movements and computation through exploiting sparsity and similarity. However, none of them exploit input similarity and only focus on sparsity and weight similarity. Synergistically analysing the similarity and sparsity of inputs and weights, we show that memory accesses and computations can be reduced by 5.7× and 4.1×, more than what can be decreased by exploiting only sparsity, and 3.9× and 2.1×, more than what can be decreased by exploiting only weight similarity. We propose a new data-aware compression approach, called DANA , to effectively utilize both sparsity and similarity in inputs and weights. DANA can be orthogonally implemented on top of different hardware DNN accelerators. As an example, we implement DANA on top of an Eyeriss-like architecture. Our results over four famous DNNs reveal that DANA outperforms Eyeriss in terms of average performance and energy consumption by 18× and 83×, respectively. Moreover, DANA is faster than the state-of-the-art sparsity-aware and similarity-aware techniques by respectively 4.6× and 4.5×, and reduces the average energy consumption over them by 3.0× and 5.8×.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    3
    References
    0
    Citations
    NaN
    KQI
    []