Massive-scale complicated human action recognition: Theory and applications

2021 
Abstract Recognizing various human actions in video is a challenging task, which is also one of the key tasks in computer vision. It has received extensive attention from AI researchers Bakker et al. (2003), Bruderlin and Williams (1995), Cardle et al. (2003), Carlsson (1996, 1999), Clausen and Kurth (2004). It has important applications in human behavior analysis, artificial intelligence, and video surveillance. Compared with still image classification, the time component of video provides important clues for recognition, so multiple human actions can be recognized based on motion information. In addition, video provides natural data enhancement for individual images. For motion recognition from videos, appearance and temporal dynamics are two key and complementary cues. In this work, we formulate a human motion recognition framework based on 2D spatial feature fusion of Kinect bone data. This method utilizes the human body structure and spatial geometry to represent human body structure and extract features. Meanwhile, it combines the active action and auxiliary action features of two spatial dimensions by layering, and leverages the pervasively used support vector machine and hidden Markov model to classify human body structure. Noticeably, it is difficult to extract its information due to the limitations of background clutter, viewpoint change, scale change, different lighting conditions and camera movement. In this way, while learning the classification information of human behavioral categories, designing effective representations is the key to deal with these challenges. In this work, a method for human motion recognition in video based on ResNext network is proposed. Based on our proposed ResNeXt network, using the data of RGB and optical flow, we can extract more appearance characteristics and temporal features of human actions, so as to better achieve the classification of actions. The utilization of video time segmentation method can capture the long range of time in video, in order to make better use of the longer range of time information in video. Comprehensive experimental results have shown that the proposed method improves the performance of UCF101 and HMDB51 motion recognition data sets to a certain extent.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    14
    References
    0
    Citations
    NaN
    KQI
    []