Parallel convolution algorithm using implicit matrix multiplication on multi-core CPUs

2019 
Convolution neural networks (CNNs) have been extensively used in machine learning applications. The most time-consuming part of CNNs are convolution operations. A common approach to implementing convolution operations is to recast them as general matrix multiplication, known as the im2col+GEMM approach. There are two main drawbacks of this approach. One is that large additional memory space is required. The other is the packing on the input elements of convolution operations are not memory-efficient enough. In this paper, we present a new parallel convolution algorithm using implicit matrix multiplication on multi-core CPUs. In comparison with Im2col+GEMM, our new algorithm can reduce the memory footprints and improve the packing efficiency. The experiment results on two ARV8-based multi-core CPUs demonstrate that our new algorithm gives much better performance and scalability than the im2col+GEMM method in most cases.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    17
    References
    5
    Citations
    NaN
    KQI
    []