MMNet: Multi-Stage and Multi-Scale Fusion Network for RGB-D Salient Object Detection

2020 
Most existing RGB-D salient object detection (SOD) methods directly extract and fuse raw features from RGB and depth backbones. Such methods can be easily restricted by low-quality depth maps and redundant cross-modal features. To effectively capture multi-scale cross-modal fusion features, this paper proposes a novel Multi-stage and Multi-Scale Fusion Network (MMNet), which consists of a cross-modal multi-stage fusion module (CMFM) and a bi-directional multi-scale decoder (BMD). Similar to the mechanism of visual color stage doctrine in human visual system, the proposed CMFM aims to explore the useful and important feature representations in feature response stage, and effectively integrate them into available cross-modal fusion features in adversarial combination stage. Moreover, the proposed BMD learns the combination of cross-modal fusion features from multiple levels to capture both local and global information of salient objects and further reasonably boost the performance of the proposed method. Comprehensive experiments demonstrate that the proposed method can achieve consistently superior performance over the other 14 state-of-the-art methods on six popular RGB-D datasets when evaluated by 8 different metrics.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    34
    References
    4
    Citations
    NaN
    KQI
    []