A Mutual Information-Based Disentanglement Framework for Cross-Modal Retrieval

Han Wu,Xiaowang Zhang,Jiachen Tian,Shaojuan Wu,Chunliu Dou,Yue Sun,Zhi-Yong Feng

A Mutual Information-Based Disentanglement Framework for Cross-Modal Retrieval

2021

Cross-modal retrieval essentially extracts the shared semantics of an object between two different modalities. However, “modality gap” may significantly limit the performance when analyzing from each modality sample. In this paper, to overcome the characteristics from heterogeneous data, we propose a novel mutual information-based disentanglement framework to capturing the precise shared semantics in cross-modal scenes. Firstly, we design a disentanglement framework to extract the shared parts of modalities, which can provide the basis for semantic measuring with mutual information. Secondly, we measure semantic associations from the perspective of distribution, which overcomes perturbations brought by “modality gap”. Finally, we formalize our framework and theoretically prove that mutual information can obtain remarkable performance under the disentanglement framework. Sufficient experimental results evaluated on two large benchmarks demonstrate that our approach can obtain significant performance in cross-modal retrieval task.

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations