Large-scale Pretraining for Visual Dialog: A Simple State-of-the-Art Baseline

Vishvak Murahari,Dhruv Batra,Devi Parikh,Abhishek Das

Large-scale Pretraining for Visual Dialog: A Simple State-of-the-Art Baseline

2019

Vishvak Murahari
Dhruv Batra
Devi Parikh
Abhishek Das

Prior work in visual dialog has focused on training deep neural models on VisDial in isolation. Instead, we present an approach to leverage pretraining on related vision-language datasets before transferring to visual dialog. We adapt the recently proposed ViLBERT model for multi-turn visually-grounded conversations. Our model is pretrained on the Conceptual Captions and Visual Question Answering datasets, and finetuned on VisDial. Our best single model outperforms prior published work by \(1\%\) absolute on NDCG and MRR.

Keywords:

Dialog box
Computer science
Natural language processing
Artificial intelligence

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations