Coupled-dynamic Learning for Vision and Language: Exploring Interaction between Different Tasks

Ning Xu,Hongshuo Tian,Yanhui Wang,Weizhi Nie,Dan Song,Anan Liu,Wu Liu

Coupled-dynamic Learning for Vision and Language: Exploring Interaction between Different Tasks

2021

Abstract Intensive research interests have been paid for the vision and language communities. Especially, image captioning task aims to generate natural language descriptions from the image content. Oppositely, image synthesis task aims to generate realistic images from natural language descriptions. Moreover, both of them can achieve promising results by using Long Short-Term Memory (LSTM), which models the sequence dynamics at each time step as hidden state. Nevertheless, the research on dynamics is often limited in the individual task, while there is no progress exploring the mutual relationship between dynamics in different tasks. In this work, we present a novel coupled-dynamic formulation that can iteratively reduce the distance between task-dependent dynamics in the training process. To embed adverse information into individual network, we construct dual-loss architectures to interactively align dynamics. We evaluate the proposed framework on Flickr8k, Flickr30k and MSCOCO datasets. Experimental results show that our approach can boost dual tasks together and achieve competing performances against state-of-the-art methods.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations