Deep Centralized Cross-modal Retrieval

2021 
The mainstream of cross-modal retrieval approaches generally focus on measure the similarity between different types of data in a common subspace. Most of these methods are based on pairs or triplets of instances, which has the following disadvantages: 1) due to the high discrepancy of pairs and triplets, there might be a large number of zero-losses, and the model cannot be updated under these circumstances. 2) global information in the common subspace cannot be fully exploited to deliver cross-modal retrieval tasks. To solve the above problems, we present a novel cross-modal retrieval method, called Deep Centralized Cross-modal Retrieval (DCCMR). Specifically, we first learn a center for embeddings of each class in the common space, then a double-quadruplet-center-loss function is proposed to force the distance between samples and centers from different classes to be larger than distance from the same class. To the best of our knowledge, the proposed DCCMR could be one of the first works to utilize the combination of quadruplet loss and center loss, leading to more stable results. Comprehensive experimental results on three widely-used benchmark datasets verify that our method achieves comparable performance compared with the state-of-the-art cross-modal retrieval methods as well as the usefulness and effectiveness of our DCCMR.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    21
    References
    0
    Citations
    NaN
    KQI
    []