Feature Interaction Based Graph Convolutional Networks for Image-Text Retrieval.

2021 
To solve the challenge of heterogeneous gap between visual and linguistic data in image-text retrieval task, many methods have been proposed and significant progress has been made. Recently, some works use more refined information of the relation between regions in images or the semantic connection between words in text to further improve the representation of text and image data, while the cross-modal relation between image region and text word is not well explored in the representation. The current methods lack feature interaction in the data representation. For this purpose, we propose a novel image-text retrieval method which introduces inter-modal feature interaction in the graph convolutional networks (GCN) of image and text fragments. By the feature interaction between fragments of different modalities and the information propagation of GCN, the proposed method can capture more inter-modal interaction information for image-text retrieval. The experimental results on MS COCO and Flickr30K datasets show that the proposed method outperforms the state-of-the-art methods.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    13
    References
    0
    Citations
    NaN
    KQI
    []