Adaptive spatiotemporal graph convolutional network with intermediate aggregation of multi-stream skeleton features for action recognition

2022 
Video-based action recognition is a challenging problem due to the rapid and uncertain changes in human actions. Recent studies show that incorporating video and human body skeleton helps improve action recognition performance. These methods generally use (GCNs) to extract structural features of the human body joints from skeleton data. Yet, most GCN-based methods have some limitations in skeleton-based action recognition. (1) The graph structure of the human body joints is time-invariant, making it difficult to represent the changing relationship between joints across actions. (2) Methods relying on single-stream models only utilize limited information of skeleton data, such as joints or bones, and fail to consider coherent features of movements. (3) Methods relying on multi-stream models have considerable parameters and are inefficient for real-life applications. To address these problems, we propose an adaptive spatiotemporal graph convolutional network with intermediate aggregation of multi-stream skeleton features for action recognition. First, our method learns an adaptive graph structure representing the changing relationship between joints. Secondly, we facilitate a multi-stream model to extract various features from the skeleton, including joint-stream, bone-stream, and motion-stream. Moreover, an intermediate aggregation strategy is employed to aggregate these features and to reduce the parameters of this model. The proposed method has been validated on various benchmarks and a real-world abnormal action dataset. Extensive experimental results show that our method achieves excellent performance in skeleton-based action recognition.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []