A Mutual Information-Based Disentanglement Framework for Cross-Modal Retrieval

2021 
Cross-modal retrieval essentially extracts the shared semantics of an object between two different modalities. However, “modality gap” may significantly limit the performance when analyzing from each modality sample. In this paper, to overcome the characteristics from heterogeneous data, we propose a novel mutual information-based disentanglement framework to capturing the precise shared semantics in cross-modal scenes. Firstly, we design a disentanglement framework to extract the shared parts of modalities, which can provide the basis for semantic measuring with mutual information. Secondly, we measure semantic associations from the perspective of distribution, which overcomes perturbations brought by “modality gap”. Finally, we formalize our framework and theoretically prove that mutual information can obtain remarkable performance under the disentanglement framework. Sufficient experimental results evaluated on two large benchmarks demonstrate that our approach can obtain significant performance in cross-modal retrieval task.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []