DT-3DResNet-LSTM: An Architecture for Temporal Activity Recognition in Videos.

2018 
Human activity recognition is a very important problem in computer vision that is still largely unsolved. While recent advances such as deep learning have given us great results on image related tasks, it is still difficult to recognize behavior in videos due to a great deal of disturbance in videos. We propose an architecture DT-3DResNet-LSTM to classify and temporally localize activities in videos. We detect objects in video frames and use these detected results as input to object tracking model, achieving data association information among adjacent frames of multiple objects. Then the clipped video frames of different objects are put into 3D Convolutional Neural Network (CNN) to achieve features, and a Recurrent Neural Network (RNN), specifically Long Short-Term Memory (LSTM), is trained to classify video clips. What’s more, we process the output of RNN (LSTM) model to get the final classification of input video and determine the temporal localization of input video.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    17
    References
    0
    Citations
    NaN
    KQI
    []