Hierarchical Dropped Convolutional Neural Network for Speed Insensitive Human Action Recognition

2018 
Human action recognition using skeleton data has lots of potential applications in content-based action retrieval and intelligent surveillance, with wide usage of depth sensors and robust skeleton estimation algorithms. Previous methods describe spatial temporal skeleton joints as a compact color image and then use Convolutional Neural Network (CNN) to extract more discriminative deep features. However, these methods ignore the effect of speed variation, which is a common phenomenon and can bring severe intra-varieties to same types of actions. To solve this problem, this paper presents a novel hierarchical dropped CNN architecture, which is constructed in two stages. Dropped CNN (d-CNN) is firstly developed to extract deep features from a probabilistic speed insensitive color image. This image expresses both spatial distributions and temporal evolutions of skeleton joints meanwhile avoids the effect of speed variations. To enhance the temporal discriminative power, we extend d-CNN to a hierarchical structure (h-CNN), where multiple scales of temporal information are encoded. Extensive experiments on benchmark MSRC-12 dataset and the largest NTU RGB+D dataset verify the effectiveness and robustness of the proposed method.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    24
    References
    1
    Citations
    NaN
    KQI
    []