Multi-modality learning for human action recognition

Ziliang Ren,Qieshi Zhang,Xiangyang Gao,Pengyi Hao,Jun Cheng

Multi-modality learning for human action recognition

2020

Ziliang Ren
Qieshi Zhang
Xiangyang Gao
Pengyi Hao
Jun Cheng

The multi-modality based human action recognition is an increasing topic. Multi-modality can provide more abundant and complementary information than single modality. However, it is difficult for multi-modality learning to capture the spatial-temporal information from the entire RGB and depth sequence effectively. In this paper, to obtain better representation of spatial-temporal information, we propose a bidirectional rank pooling method to construct the RGB Visual Dynamic Images (VDIs) and Depth Dynamic Images (DDIs). Furthermore, we design an effective segmentation convolutional networks (ConvNets) architecture based on multi-modality hierarchical fusion strategy for human action recognition. The proposed method has been verified and achieved the state-of-the-art results on the widely used NTU RGB+D, SYSU 3D HOI and UWA3D II datasets.

Keywords:

Pooling
RGB color model
Computer vision
Segmentation
Computer science
Pattern recognition
action recognition
computer communication networks
Artificial intelligence
multimedia information systems
multi modality

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations