Fused DSConv: Optimizing Sparse CNN Inference for Execution on Edge Devices

Jia Guo,Radu Teodorescu,Gagan Agrawal

Fused DSConv: Optimizing Sparse CNN Inference for Execution on Edge Devices

2021

Jia Guo
Radu Teodorescu
Gagan Agrawal

Accelerating CNN on resource-constrained edge devices is becoming an increasingly important problem with the emergence of IoT and edge computing. This paper proposes an execution strategy and an implementation for efficient execution of CNNs. Our execution strategy combines two previously published, but not widely used, ideas – direct sparse convolution and fusion of two convolution layers. Together with a scheme for caching intermediate results, this results in a very efficient mechanism for speeding up inference after the model has been sparsified. We also demonstrate an efficient implementation that uses both multi-core and SIMD parallelism. Our experimental results demonstrate that our scheme significantly outperforms existing implementations on an edge device, while also scaling better in a server environment.

Keywords:

Inference
Cloud computing
Parallel computing
Edge device
SIMD
Computer science
Server
Edge computing
Convolution
Context model

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations