Optimizing FFT-Based Convolution on ARMv8 Multi-core CPUs

Qinglin Wang,Dongsheng Li,Xiandong Huang,Siqi Shen,Songzhu Mei,Jie Liu

Optimizing FFT-Based Convolution on ARMv8 Multi-core CPUs

2020

Convolutional Neural Networks (CNNs) are widely applied in various machine learning applications and very time-consuming. Most of CNNs’ execution time is consumed by convolutional layers. A common approach to implementing convolutions is the FFT-based one, which can reduce the arithmetic complexity of convolutions without losing too much precision. As the performance of ARMv8 multi-core CPUs improves, they can also be utilized to perform CNNs like Intel X86 CPUs. In this paper, we present a new parallel FFT-based convolution implementation on ARMv8 multi-core CPUs. The implementation makes efficient use of ARMv8 multi-core CPUs through a series of computation and memory optimizations. The experiment results on two ARMv8 multi-core CPUs demonstrate that our new implementation gives much better performance than two existing approaches in most cases.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations