Multimodal Feature Fusion Model For Rgb-D Action Recognition

2021 
Due to the variation and complexity of video data, human action recognition is still a challenging task. However, most of the methods based on RGB-D action recognition simply fuse the multimodal features, ignoring the potential semantic relationship between different modals. In this paper, we propose a multi-modal recognition model based on Bilinear Pooling and Attention Network (BPAN), which could effectively fuse multi-modal for RGB-D action recognition. Firstly, we adopt the efficient data preprocessing methods for RGB and skeleton. Then, a multimodal fusion network BPAN combining RGB and skeleton sequence is proposed, which could effectively compress the features of RGB and skeleton, and project them into potential subspace to obtain fusion features. In the end, a fully connected three-layer perceptron is adopted to obtain the final classification. Experimental results on two datasets demonstrate that our proposed method leads to a more favorable performance compared with the state-of-the-art methods.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []