A Novel Feature Extractor for Human Action Recognition in Visual Question Answering

2021 
Abstract Recognizing and classifying human actions in video clips is a powerful technology for surveillance applications. However, most of the state-of-the-art approaches for this task lack the possibility of being implemented in real-time applications without causing a critical delay. Thus, we propose a fast method to human action recognition for visual question answering, based on a novel feature extractor developed by us, 2D pose estimation, and machine learning techniques. Our extractor obtains features based on distances, angles, and positions of detected anatomical keypoints. We used the UCF101 dataset that corresponds to 13320 videos with realistic human actions, collected from YouTube, for our work evaluation. The proposed feature extractor, combined with the Complement Naive Bayes classifier, reached a mean Average Precision ( m A P ) of 62.03 % and processed 5.26 frames per second, proving to be faster than most methods while achieving a decent m A P .
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    37
    References
    0
    Citations
    NaN
    KQI
    []