XwiseNet: action recognition with Xwise separable convolutions

Hefei Ling,Yao Chen,Jiazhong Chen,Lei Wu,Yuxuan Shi,Jing Deng

XwiseNet: action recognition with Xwise separable convolutions

2020

With the emergence of a large number of video resources, video action recognition is attracting much attention. Recently, realizing the outstanding performance of three-dimensional (3D) convolutional neural networks (CNNs), many works have began to apply it for action recognition and obtained satisfactory results. However, little attention has been paid to reduce the model size and computation cost of 3D CNNs. In this paper, we first propose a novel 3D convolution called the Xwise Separable Convolution, then we construct an original 3D CNN called the XwiseNet. Our work aims to make 3D CNNs lightweight without reducing its recognition accuracy. Our key idea is extremely decoupling the 3D convolution in channel, spatial and temporal dimensions. Experiments have verified that the XwiseNet outperforms 3D-ResNet-50 on the Mini-Kinetics benchmark with only 6% training parameters and 12% computation cost.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations