Structural iMoSIFT for Human Action Recognition

2016 
Classic local space-time features are successful representations for action recognition in videos. However, these features always confuse object motions with camera motions, which seriously affect the accuracy of action recognition. In this paper, we propose improved motion scale-inviriant feature transform(iMoSIFT) algorithm to eliminate the negative effects caused by camera motions. Based on iMoSIFT, we consider the spatial-temporal structure relationship among iMoSIFT interest points, and adopt locally weighted word context descriptors to code this relationship. Then, we use two-layer BOW representation for every video clip. The proposed approach is evaluated on available datasets, namely Weizemann, KTH and UCF sports. The experimental results clearly demonstrate the effectiveness of the proposed approach.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    1
    Citations
    NaN
    KQI
    []