Predicting Actions in Videos and Action-Based Segmentation Using Deep Learning

2021 
In this paper, we propose a technique to recognize multiple actions in a video using deep learning. The proposed approach is concerned with interpreting the overall context of a video and transforming it into one or more appropriate actions. In order to cope with multiple actions in a video, our proposed technique first determines the individual segments/shots in a video using intersections of color histograms. The segmented parts are then fed to the action recognition system comprising a combination of a Convolution Neural Network (CNN) and a Long-Short-Term Memory (LSTM) network trained on our action vocabulary. The segments are then labeled according to their predicted actions and a compact set of distinct actions is produced. Using the corpus generated by the shot detection phase, which includes the location of keyframes in shots, and start/end timestamps of a shot, we can also perform video segmentation based on an action query. Hence, the proposed technique can be used for a number of tasks such as content censoring, on-demand scene retrieval, video summarizing, and query-based scene/video retrieval, to name a few. The proposed technique also stands apart from the existing approaches which either do not take into account the motion information for action prediction or do not perform action-based video segmentation. The experimental results presented in this paper show that the proposed technique not only finds the complete set of actions present in the video but can also find all the relevant parts in a video based on an action query.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    64
    References
    0
    Citations
    NaN
    KQI
    []