Bi-directional Attention Feature Enhancement for Video Instance Segmentation.

2021 
As a recently proposed task, video instance segmentation (VIS) can classify, detect, segment and track each instance in a given video, which is very useful for driving environment perception of autonomous vehicles. In this paper, we propose a novel method called bi-directional attention feature enhancement (BAFE) to make up for the lack of feature processing in existing works of VIS. BAFE contains a top-down attention branch and a bottom-up attention branch. It uses attention to perform two-way information transmission between high-level, semantic features and low-level, local features to combine them efficiently and can be utilized in any network that uses both high-level and low-level features. We add BAFE before the mask-specialized regression branch of the latest work, spatial information preservation for VIS (SipMask- VIS), to enhance features for mask generating. Besides, we introduce modified path aggregation network (Mod-PAN) to further enhance features. Different from existing works which first train their models on image datasets and then use the pre-trained models to finish the VIS task, we obtain VIS results in an end-to-end way without any pre-training. Our method outperforms SipMask-VIS by an absolute gain of 2.5%, which strongly proves its effectiveness.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []