Multimodal Reconstruction Using Vector Representation

Shagan Sah,Ameya Shringi,Dheeraj Peri,John F. Hamilton,Andreas E. Savakis,Raymond W. Ptucha

Multimodal Reconstruction Using Vector Representation

2018

Recent work has demonstrated that neural embedding from multiple modalities can be utilized to focus the results of generative adversarial networks. However, little work has been done towards developing a procedure to combine vectors from different modalities for the purpose of reconstructing input. Generally, embeddings from different modalities are concatenated to create a larger input vector. In this paper, we propose learning a Common Vector Space (CVS) where similar inputs from different modalities cluster together. We develop a framework to analyze the extent of reconstruction and robustness offered by CVS. We apply the CVS for the purpose of annotating, generating and captioning images on MS-COCO. We show that CVS is on par with techniques used for multiple modality embeddings while offering more flexibility as the number of modalities increases.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations