A Lightweight Multi-scale Crossmodal Text-Image Retrieval Method In Remote Sensing

2021 
Remote sensing (RS) crossmodal text-image retrieval has become a research hotspot in recent years for its application in semantic localization. However, since multiple inferences on slices are demanded in semantic localization, designing a crossmodal retrieval model with less computation but well performance becomes an emergent and challenging task. In this paper, considering the characteristics of multi-scale and target redundancy in RS, a concise but effective crossmodal retrieval model (LW-MCR) is designed. The proposed model incorporates multi-scale information and dynamically filters out redundant features when encoding RS image while text features are obtained via lightweight group convolution. To improve the retrieval performance of LW-MCR, we come up with a novel hidden supervised optimization method based on knowledge distillation. This method enables the proposed model to acquire dark knowledge of the multi-level layers and representation layers in the teacher network, which significantly improves the accuracy of our lightweight model. Finally, on the basis of contrast learning, we present a method employing unlabeled data to boost the performance of RS retrieval model further. The experiment results on four RS image-text datasets demonstrate the efficiency of LW-MCR in RS crossmodal retrieval tasks.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    2
    Citations
    NaN
    KQI
    []