Fused DSConv: Optimizing Sparse CNN Inference for Execution on Edge Devices

2021 
Accelerating CNN on resource-constrained edge devices is becoming an increasingly important problem with the emergence of IoT and edge computing. This paper proposes an execution strategy and an implementation for efficient execution of CNNs. Our execution strategy combines two previously published, but not widely used, ideas – direct sparse convolution and fusion of two convolution layers. Together with a scheme for caching intermediate results, this results in a very efficient mechanism for speeding up inference after the model has been sparsified. We also demonstrate an efficient implementation that uses both multi-core and SIMD parallelism. Our experimental results demonstrate that our scheme significantly outperforms existing implementations on an edge device, while also scaling better in a server environment.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    30
    References
    0
    Citations
    NaN
    KQI
    []