Learning spatio-temporal features for action recognition from the side of the video

2016 
A novel spatio-temporal feature learning approach is introduced for action recognition. First, we automatically detect and track the actor, and map the action track to a cuboid. Then, we split the cuboid into block sequences. Each block sequence is represented as a data vector by concatenating the block shape features. For each action category, we use a two-layer network to learn the distribution of the data vectors. The first layer network is constituted by multiple Restricted Boltzmann Machines (RBMs). Each RBM is trained by the data vectors that have the same spatial location. The output of the second layer RBM is the learned spatio-temporal feature. At last, we train a Support Vector Machine classifier for each class to recognize the actions. Experiments on challenging data sets confirm the effectiveness of our approach.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    27
    References
    9
    Citations
    NaN
    KQI
    []