A spatiotemporal attention-based ResC3D model for large-scale gesture recognition

2018 
Abnormal gesture recognition has many applications in the fields of visual surveillance, crowd behavior analysis, and sensitive video content detection. However, the recognition of dynamic gestures with large-scale videos remains a challenging task due to the barriers of gesture-irrelevant factors like the variations in illumination, movement path, and background. In this paper, we propose a spatiotemporal attention-based ResC3D model for abnormal gesture recognition with large-scale videos. One key idea is to find a compact and effective representation of the gesture in both spatial and temporal contexts. To eliminate the influence of gesture-irrelevant factors, we first employ the enhancement techniques such as Retinex and hybrid median filer to improve the quality of RGB and depth inputs. Then, we design a spatiotemporal attention scheme to focus on the most valuable cues related to the moving parts for the gesture. Upon these representations, a ResC3D network, which leverages the advantages of both residual network and C3D model, is developed to extract features, together with a canonical correlation analysis-based fusion scheme for blending features from different modalities. The performance of our method is evaluated on the Chalearn IsoGD Dataset. Experiments demonstrate the effectiveness of each module of our method and show the ultimate accuracy reaches 68.14%, which outperforms other state-of-the-art methods, including our basic work in 2017 Chalearn Looking at People Workshop of ICCV.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    66
    References
    6
    Citations
    NaN
    KQI
    []