Row Streaming Dataflow using Chaining Buffer and Systolic Array+ Structure

2021 
Convolutional Neural Networks (CNNs) are widely used to solve complex problems in various fields, such as image recognition, image classification, and video analysis. Convolutional (CONV) layers are the most computational part of the CNN inference; various architectures have been proposed to process it efficiently. Among those, a systolic array consists of a 2D array of processing elements, which handle GEneral Matrix Multiplication (GEMM) with high efficiency. However, to process a CONV layer as a GEMM type, image-to-column (im2col) processing, which is also called lowering, is required per layer, necessitating a larger on-chip memory and a considerable amount of repetitive on-chip memory access. In this letter, we propose a systolic array+ (SysAr+) structure augmented with a chaining buffer and a row-streaming dataflow that can maximize data reuse without the im2col pre-process in the CONV layer and the repetitive access from the large on-chip memory. By applying the proposed method to the 3×3 CONV layers, we reduce the energy consumption by up to 19.7 percent in ResNet and 37.4 percent in DenseNet with an area overhead of 1.54 percent in SysAr+, and we improve the performance by up to 32.4 percent in ResNet and 12.1 percent in DenseNet.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    9
    References
    1
    Citations
    NaN
    KQI
    []