A novel spatiotemporal attention enhanced discriminative network for video salient object detection

2021 
In contrast to image salient object detection, on which many achievements have been made, video salient object detection remains a considerable challenge. Not all features are useful in salient object detection, and some even cause interferences. In this paper, we propose a novel multiscale spatiotemporal ConvLSTM model based on an attention mechanism, which introduces space-based and channel-based attention mechanisms and improves the network’s capability to extract high-level semantic information and low-level spatial structural features. First, to obtain more effective spatiotemporal information, a ConvLSTM module embedded with an attention mechanism (CSAtt-ConvLSTM) is designed at higher layers of the network to weight salient features of the extracted spatiotemporal consistency. Second, a multiscale attention (MSA) module for distinguishing features is designed, which introduces two attention mechanisms: channel-wise attention (CA) units and spatial-wise attention (SA) units. The CA and SA units are used after high-level feature mapping obtained by the CSAtt-ConvLSTM module and shallow feature mapping, respectively, and then their outputs are fused as final output feature maps. A large number of experiments on multiple datasets verified the effectiveness of our proposed model, which reached a real-time speed on a single GPU of 20 fps.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    63
    References
    0
    Citations
    NaN
    KQI
    []