Saliency-based object recognition in video

2013 
In this paper we study the problem of object recognition in egocentric video recorded with cameras worn by persons. This task has gained much attention during the last years, since it has turned to be a main building block for action recognition systems in applications involving wearable cameras, such as tele-medicine or lifelogging. Under these scenarios, an action can be effectively defined as a sequence of manipulated or observed objects, so that recognition becomes a relevant stage of the system. Furthermore, video summarization tasks on such content is also driven by appearance of semantic objects in camera field of view. One of the particularities of first-person camera videos is that they usually present a strong differentiation between active (manipulated or observed by the user wearing the camera) and passive objects (associated to background). In addition, spatial, temporal and geometric cues can be found in the video content that may help to identify the active elements in the scene. These saliency features are related to the modelling of Human Visual System, but also to motor coordination of eye, hand and body movements. In this paper, we discuss the automatic generation of saliency maps in video, and introduce a method that extends the well-known Bag-of-Words (BoW) paradigm with saliency information. We have assessed our proposal in several egocentric video datasets, demonstrating that it not only improves the BoW reference, but also achieves state-ofthe-art performance of e.g. part - based models, with noticeably lower computational times. The approach has tremendous perspectives for other User Generated mobile Content.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    35
    References
    7
    Citations
    NaN
    KQI
    []