Learning Token-Aligned Representations With Multimodel Transformers for Different-Resolution Change Detection

2022 
Different-resolution change detection (DRCD) is now becoming an urgent problem to be solved, which is of great potential in rapid monitoring, such as disaster assessment and urban expansion. In DRCD tasks, bitemporal inputs are given in the form of different resolutions, and thus, conventional change detection (CD) methods cannot be applied directly. Previous studies have attempted to deal with this problem by reconstructing the low-resolution (LR) image into a high-resolution (HR) one, including interpolation and super-resolution (SR). However, these solutions are limited by the availability of training data, making it hard to meet different kinds of needs. Besides, these image-level strategies have also ignored the interaction and alignment of high-level features. Therefore, we propose a new approach based on multimodel Transformers (MM-Trans), which solves the resolution gaps of bitemporal inputs in DRCD tasks from the perspective of feature alignment. In the MM-Trans, a weight-unshared feature extractor is first utilized to precisely capture the features of the different-resolution inputs. Then, a spatial-aligned Transformer (sp-Trans) is introduced to align the LR-image features to the same size of the HR-image ones, which can be optimized in a learnable way by an auxiliary token loss. After that, a semantic-aligned Transformer (se-Trans) is adopted, in which the bitemporal features can be further interacted and aligned semantically. Finally, a prediction head is employed to obtain fine-grained change results. Experiments conducted on three common CD datasets, CDD, S2Looking, and HTCD dataset, have shown the advancement of the MM-Trans and fully demonstrated its potential in DSCD tasks.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    49
    References
    0
    Citations
    NaN
    KQI
    []