Robust Action Recognition via Borrowing Information Across Video Modalities

2015 
The recent advances in imaging devices have opened the opportunity of better solving the tasks of video content analysis and understanding. Next-generation cameras, such as the depth or binocular cameras, capture diverse information, and complement the conventional 2D RGB cameras. Thus, investigating the yielded multimodal videos generally facilitates the accomplishment of related applications. However, the limitations of the emerging cameras, such as short effective distances, expensive costs, or long response time, degrade their applicability, and currently make these devices not online accessible in practical use. In this paper, we provide an alternative scenario to address this problem, and illustrate it with the task of recognizing human actions. In particular, we aim at improving the accuracy of action recognition in RGB videos with the aid of one additional RGB-D camera. Since RGB-D cameras, such as Kinect, are typically not applicable in a surveillance system due to its short effective distance, we instead offline collect a database, in which not only the RGB videos but also the depth maps and the skeleton data of actions are available jointly. The proposed approach can adapt the interdatabase variations, and activate the borrowing of visual knowledge across different video modalities. Each action to be recognized in RGB representation is then augmented with the borrowed depth and skeleton features. Our approach is comprehensively evaluated on five benchmark data sets of action recognition. The promising results manifest that the borrowed information leads to remarkable boost in recognition accuracy.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    52
    References
    12
    Citations
    NaN
    KQI
    []