DualNet: Domain-invariant network for visual question answering

Kuniaki Saito,Andrew Shin,Yoshitaka Ushiku,Tatsuya Harada

DualNet: Domain-invariant network for visual question answering

2017

Visual question answering (VQA) tasks use two types of images: abstract (illustrations) and real. Domain-specific differences exist between the two types of images with respect to “objectness,” “texture,” and “color.” Therefore, achieving similar performance by applying methods developed for real images to abstract images, and vice versa, is difficult. This is a critical problem in VQA, because image features are crucial clues for correctly answering the questions about the images. However, an effective, domain-invariant method can provide insight into the high-level reasoning required for VQA. We thus propose a method called DualNet that demonstrates performance that is invariant to the differences in real and abstract scene domains. Experimental results show that DualNet outperforms state-of-the-art methods, especially for the abstract images category.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations