Video Saliency Estimation via Encoding Deep Spatiotemporal Saliency Cues

2018 
Spatial and temporal information are two important cues for video saliency estimation. In view of the problems of current video saliency estimation methods, we propose a novel video saliency estimation model by encoding the deep spatiotemporal saliency cues. Firstly, we train an elegant and effective saliency detection network based on VGG16Net to extract the deep spatial saliency cues from the video sequences frame by frame in semantical level. Secondly, we adopt the Flow-Net to get the deep motion features which will be incorporated into our global contrast model to sufficiently extract the temporal saliency cues. Thirdly, we construct a spatiotemporal saliency optimization framework based on stacked auto-encoders to infer the final saliency value and guarantee spatiotemporal consistency by encoding the concatenated deep spatiotemporal saliency cues. Experiment analyses on two public datasets verify the robustness and effectiveness of our method.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []