Aggregating Spatio-temporal Context for Video Object Segmentation

2020 
In this paper, we focus on aggregating spatio-temporal contextual information for video object segmentation. Our approach exploits the spatio-temporal relationship among image regions by modelling the dependencies among the corresponding visual features with a spatio-temporal RNN. Our spatio-temporal RNN is placed on top of a pre-trained CNN network to simultaneously embed spatial and temporal information into the feature maps. Following the spatio-temporal RNN, we further construct an online adaption module to adapt the learned model for segmenting specific objects in given video. We show that our adaption module can be optimized efficiently with closed-form solutions. Our experiments on two public datasets illustrate that the proposed method performs favorably against state-of-the-art methods in terms of efficiency and accuracy.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    23
    References
    0
    Citations
    NaN
    KQI
    []