Saliency-based object recognition in video
2013
In this paper we study the problem of object recognition in egocentric video recorded with cameras worn by persons. This task has
gained much attention during the last years, since it has turned to
be a main building block for action recognition systems in applications involving wearable cameras, such as tele-medicine or lifelogging. Under these scenarios, an action can be effectively defined
as a sequence of manipulated or observed objects, so that recognition becomes a relevant stage of the system. Furthermore, video
summarization tasks on such content is also driven by appearance
of semantic objects in camera field of view.
One of the particularities of first-person camera videos is that
they usually present a strong differentiation between active (manipulated or observed by the user wearing the camera) and passive
objects (associated to background). In addition, spatial, temporal
and geometric cues can be found in the video content that may help
to identify the active elements in the scene. These saliency features
are related to the modelling of Human Visual System, but also to
motor coordination of eye, hand and body movements. In this paper, we discuss the automatic generation of saliency maps in video,
and introduce a method that extends the well-known Bag-of-Words
(BoW) paradigm with saliency information. We have assessed our
proposal in several egocentric video datasets, demonstrating that it
not only improves the BoW reference, but also achieves state-ofthe-art performance of e.g. part - based models, with noticeably
lower computational times. The approach has tremendous perspectives for other User Generated mobile Content.
Keywords:
- Correction
- Source
- Cite
- Save
- Machine Reading By IdeaReader
35
References
7
Citations
NaN
KQI