A Low-latency Sparse-Winograd Accelerator for Convolutional Neural Networks

2019 
Low-latency and low-power implementations of Convolutional Neural Network (CNN) are highly desired for budget-restricted scenarios. Pruning and Winograd algorithm are two representative approaches to reduce the computation complexity of CNNs. Coupling them is very attractive, but the Winograd transformation removes data sparsity brought by pruning. In this paper, we present a low-latency sparse-Winograd CNN accelerator (LSW-CNN) for pruned Wino-grad CNN models. The ReLU-modified algorithm is employed to solve the zero refilling issue. Our design fully leverages the sparsity in both weights and activations, and thus eliminates all unnecessary computation and cycles. Moreover, a novel fast mask indexing algorithm for sparse data compression is developed. Accumulation buffers are scaled to reduce the latency brought by irregular serial channel merging. On VGG-16, experimental results demonstrate that the latency of LSW-CNN is reduced by 5.1 and 1.7 times, respectively, compared with state-of-the-art dense-Winograd and sparse-Winograd accelerators. Besides, the consumed hardware resource is also significantly reduced.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    23
    References
    9
    Citations
    NaN
    KQI
    []