Fast Algorithm of Unified Layer Performing Convolution and Average Pooling on the GPU
2019
Recently , Convolutional Neural Networks (CNN) have made a major contribution in the field of recognition. CNN has multiple convolution layers and convolution operations are needed large number of floating-point operations. So , convolution operations are bottleneck of CNN. The main contribution of this paper is to present new methods for convolution and average pooling computation on the GPU. First , we present fused filter method. An average pooling can be considered as a kernel. A convolution kernel and an average kernel can be fused. In fused filter method , convolution using fused filter reduces floating-point operations for convolution and pooling computation. Also , we present direct sum method. Convolution and average pooling computation are commutative. In direct sum method , switching convolution and average pooling computation reduces floating point operations for convolution and pooling computation. Experimental results using NVIDIA V100 show direct sum method attains a speed-up factor of up to 1.8 (single precision) and 4.4 (half precision) over cuDNN naive implementation.
Keywords:
- Correction
- Source
- Cite
- Save
- Machine Reading By IdeaReader
0
References
0
Citations
NaN
KQI