CNN inference: VLSI architecture for convolution layer for 1.2 TOPS

2017 
Deep Learning techniques like Convolutional Neural Networks (CNN) are getting popular for image classification with the broad usage spanning across automotive, industrial, medicine, robotics etc. Typical CNN network consists of multiple layers of convolutions, non-linearity, spatial pooling and fully connected layer, with 2D convolutions constituting more than 95% of overall computations. In this paper, we propose novel systolic and fully pipelined architecture for convolution layer which can scale to a high performance at a very low area. The architecture is based on innovative techniques namely vector outer product and intelligent data feeder to enable 3 levels of parallelism namely data values, outputs and inputs along with pipelining of compute elements with data movements. The proposed architecture is scalable to provide processing throughput of 64/256/512/1024 Multiplies and Add (MAC) per cycle. The architecture can run up to clock 600 MHz in low power 28 nm CMOS process node enabling performance of 1.2 Tera-Ops (TOPS).
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    10
    References
    1
    Citations
    NaN
    KQI
    []