Human Action Recognition Based on a Two-stream Convolutional Network Classifier

2017 
Currently, video generation devices are simpler to manipulate, more portable and with lower prices. This allowed easy storage and transmission of large amounts of media, such as videos, which has facilitated the analysis of information, independent of human assistance for evaluation and exhaustive search of videos. Virtual reality, robotics, tele-medicine, humanmachine interface and tele-surveillance are applications for these techniques. This paper describes a method for human action recognition in videos using two convolutional neural networks (CNNs). The first one Spatial Stream (trained with frames of the video) and the second one Temporal Stream, trained with stacks of Dense Optical Flow (DOF). Both streams were trained separately and from both of them we generated a classification histogram based on the most frequent class assignment. For final classification, those histograms were combined to produce a single output. The technique was tested in two public action video datasets: Weizmann and UCF Sports. We achieve 84.44% of accuracy on Weizmann dataset for Spatial stream and 78.46% on UCF Sports dataset. For the Weizmann dataset we obtained 91.11% with networks combination.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    28
    References
    5
    Citations
    NaN
    KQI
    []