Multimodal Feature Fusion Model For Rgb-D Action Recognition

Xu Weiyao,Wu Muqing,Zhao Min,Xia Ting

Multimodal Feature Fusion Model For Rgb-D Action Recognition

2021

Due to the variation and complexity of video data, human action recognition is still a challenging task. However, most of the methods based on RGB-D action recognition simply fuse the multimodal features, ignoring the potential semantic relationship between different modals. In this paper, we propose a multi-modal recognition model based on Bilinear Pooling and Attention Network (BPAN), which could effectively fuse multi-modal for RGB-D action recognition. Firstly, we adopt the efficient data preprocessing methods for RGB and skeleton. Then, a multimodal fusion network BPAN combining RGB and skeleton sequence is proposed, which could effectively compress the features of RGB and skeleton, and project them into potential subspace to obtain fusion features. In the end, a fully connected three-layer perceptron is adopted to obtain the final classification. Experimental results on two datasets demonstrate that our proposed method leads to a more favorable performance compared with the state-of-the-art methods.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations