Bidirectional LSTM with saliency-aware 3D-CNN features for human action recognition

2021 
Deep convolutional neural network (DCNN) and Recurrent neural network (RNN) received increasing attention in multimedia understanding and obtained remarkable action recognition performance. However, videos contain rich motion information with varying dimensions. Existing recurrent based pipelines fail to capture long-term motion dynamics in videos with various motion scales and complex actions performed by multiple actors. Consideration of contextual and salient features is more important than mapping a video frame into a static video representation. This research work provides a novel pipeline by analyzing and processing the video information using a 3D convolution (C3D) network and newly introduced deep bi-directional LSTM. Like popular two stream convent, we also introduce two stream framework with one modification i.e. we replace the optical flow stream by saliency-aware stream to avoid the computational complexity. First, we generate saliency-aware video stream by applying saliency-aware method. Secondly, two-stream 3D-convolutional network (C3D) is utilized with two kinds of streams i.e., RGB and saliency-aware video stream to extract both spatial and semantic information. Next, deep bi-directional LSTM network is used to learn sequential deep temporal dynamics. Finally, time-series-pooling layer and softmax layers are used to classify the human activity. The introduced system can learn long-term temporal dependencies and able to predict complex human actions. Experimental results demonstrate the significant improvement in action recognition accuracy on different benchmark datasets.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []