Searching For Desired Person Doing Desired Action based on Visual and Audio Feature in Large Scale Video Database

2020 
How to find a person doing an action in a video database is a challenging problem because the result must be correct at an instance level with the specific person doing the appropriate action. Even though there have been many works about face recognition and action recognition, they often focus on only one separate task. In this paper, the problem could be formulated into an instance retrieval where input is a query consisting of examples of the target person and examples of the desired action, and the result is a list of ranked positive shots. In this work, we proposed a simple but efficient person-action retrieval system by combining multimodal features including visual feature and audio feature to deal with various types of instances by making use of available visual or audio cues. The evaluation results on a large-scale BBC Eastenders dataset with 3rd rank in a total of 6 teams in TRECVID INS 2019 has proved the effectiveness of the proposed method.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    19
    References
    0
    Citations
    NaN
    KQI
    []