Fast Algorithm of Unified Layer Performing Convolution and Average Pooling on the GPU

2019 
Recently , Convolutional Neural Networks (CNN) have made a major contribution in the field of recognition. CNN has multiple convolution layers and convolution operations are needed large number of floating-point operations. So , convolution operations are bottleneck of CNN. The main contribution of this paper is to present new methods for convolution and average pooling computation on the GPU. First , we present fused filter method. An average pooling can be considered as a kernel. A convolution kernel and an average kernel can be fused. In fused filter method , convolution using fused filter reduces floating-point operations for convolution and pooling computation. Also , we present direct sum method. Convolution and average pooling computation are commutative. In direct sum method , switching convolution and average pooling computation reduces floating point operations for convolution and pooling computation. Experimental results using NVIDIA V100 show direct sum method attains a speed-up factor of up to 1.8 (single precision) and 4.4 (half precision) over cuDNN naive implementation.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []