End-to-end Boundary Exploration for Weakly-supervised Semantic Segmentation

2021 
It is full of challenges for weakly supervised semantic segmentation (WSSS) acquiring the pixel-level object location with only image-level annotations. Especially, the single-stage methods learn image- and pixel-level labels simultaneously to avoid complicated multi-stage computations and sophisticated training procedures. In this paper, we argue that using a single model to accomplish image- and pixel-level classification will fall into the balance of multi-target and consequently weakens the recognition capability. Because the image-level task tends to learn position-independent features, but the pixel-level task tends to be position-sensitive. Hence, we propose an effective encoder-decoder framework to explore object boundaries and solve the above dilemma. The encoder and decoder learn position-independent and position-sensitive features independently during the end-to-end training. In addition, a global soft pooling is suggested to suppress background pixels' activation for the encoder training and further improve the class activation map (CAM) performance. The edge annotations for the decoder training are synthesized by the high confidence CAMs, which do not requires extra supervision. The extensive experiments on the Pascal VOC12 dataset demonstrate that our method achieves state-of-the-art compared to the end-to-end approaches. It gets 63.6% and 65.7% mIoU scores on val and test sets respectively.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    41
    References
    0
    Citations
    NaN
    KQI
    []