Pyramidal Graph Convolutional Network for Skeleton-based Human Action Recognition

2021 
The emergence of low-cost depth sensors opens up new potentials for skeleton-based human action recognition. The recent methods for this task have made significant progress by incorporating graph convolution. However, they (1) have limitations in modeling the complex and variable temporal dynamics, and (2) cannot maximize the complementarity of the spatial and temporal features. Besides, (3) the loss function of these methods has an inherent weakness in optimizing the intraclass compactness. To this end, we propose a pyramidal graph convolutional network (PY-GCN) in this paper. Specifically, (1) an effective yet efficient single-oriented pyramidal convolution is proposed. It involves multiple kernels with varying sizes and depths that are capable of capturing different levels of the temporal dynamics at multiple scales. (2) A pseudo-two-stream structure for the basic block of the network is proposed to comprehensively aggregate discriminative cross-spatiotemporal features. Moreover, (3) a pairwise Gaussian loss together with the cross-entropy loss is introduced to the model, which can focus on both intraclass compactness and interclass separability. Our PY-GCN achieves state-of-the-art performance on three challenging large-scale datasets.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    29
    References
    0
    Citations
    NaN
    KQI
    []