NEMO-CNN: An Efficient Near-Memory Accelerator for Convolutional Neural Networks

2021 
The relevance of Deep Learning applications has skyrocketed in the last few years, exposing key weaknesses of traditional Von Neumann hardware architectures. With high amounts of data to be fetched from memory, the efficiency of these systems gets adversely impacted by an order of magnitude for each memory hierarchy that is traversed (e.g., data cache, on- chip SRAM, and off-chip DRAM). Such an issue is even more relevant when we consider that Convolutional Neural Networks (CNNs) are composed of tens of millions of parameters that imply billions of operations per second to achieve an acceptable performance. In order to remove this so-called memory wall problem, we introduce NEMO-CNN: a high-performance hardware accelerator built around the Near-Memory Computing paradigm, i.e., a design methodology based on distributed memory blocks enhanced with nearby processing elements. Coupled with a smart mapping strategy that slices the CNN structure along its depth, our solution drastically reduces the amount of data exchanged between off- and on-chip memories by executing each slice concurrently on dedicated processing elements that only leverage local data. Experimental results using VGG-16, DarkNet-19, and TinyYOLOv2 networks demonstrate that our solution achieves a top efficiency of 60.7 FPS/W, outperforming existing CNN accelerators.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    14
    References
    0
    Citations
    NaN
    KQI
    []