Fine-Grained Image-Text Retrieval via Complementary Feature Learning

2021 
Fine-grained image-text retrieval task aims to search the sample of same fine-grained subcategory from one modal (e.g., image) to another (e.g., text). The key is to learn an effective feature representation and accomplish the alignment between images and texts. This paper proposes a novel Complementary Feature Learning (CFL) method for fine-grained image-text retrieval. Firstly CFL encodes images and texts by Convolutional Neural Network and Bidirectional Encoder Representations from Transformers. Further, with the help of Frequent Pattern Mining technique (for images) and special classification token of Bidirectional Encoder Representations from Transformers (for texts), a stronger fine-grained feature is learned. Secondly the image information and text information are aligned in a common latent space by pairwise dictionary learning. Finally, a score function can be learned to measure the relevance between image-text pairs. Further, we verify our method on two specific fine-grained image-text retrieval tasks. Extensive experiments demonstrate the effectiveness of our CFL.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    26
    References
    1
    Citations
    NaN
    KQI
    []