Temporal Segment Connection Network for Action Recognition

2020 
Two-stream Convolutional Neural Networks have shown excellent performance in video action recognition. Most existing works train each sampling group independently, or just fuse at the last level, which obviously ignore the continuity of action in temporal and the complementary information between action fragments. In this paper, a temporal segment connection network is proposed to overcome these limitations. On the one hand, the forget gate module of the long short-term memory (LSTM) network is used to establish feature-level connections between each sampling group. This not only strengthens the information transmission between the sampling groups to enhance the temporal connectivity, but also extracts the complementary information between the sampling groups to enhance the overall representation of the action. On the other hand, a bi-directional long short-term memory (Bi-LSTM) network is used to automatically evaluate the importance weights of each sampling group based on the deep feature sequence. The experimental results on UCF101 and HMDB51 datasets show that the proposed model can effectively improve the utilization rate of temporal information and the ability of overall action representation, thus significantly improves the accuracy of human action recognition.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    27
    References
    1
    Citations
    NaN
    KQI
    []