Semantically Guided Visual Question Answering

Handong Zhao,Quanfu Fan,Dan Gutfreund,Yun Fu

Semantically Guided Visual Question Answering

2018

Handong Zhao
Quanfu Fan
Dan Gutfreund
Yun Fu

We present a novel approach to enhance the challenging task of Visual Question Answering (VQA) by incorporating and enriching semantic knowledge in a VQA model. We first apply Multiple Instance Learning (MIL) to extract a richer visual representation addressing concepts beyond objects such as actions and colors. Motivated by the observation that semantically related answers often appear together in prediction, we further develop a new semantically-guided loss function for model learning which has the potential to drive weakly-scored but correct answers to the top while suppressing wrong answers. We show that these two ideas contribute to performance improvement in a complementary way. We demonstrate competitive results comparable to the state of the art on two VQA benchmark datasets.

Keywords:

Semantics
Semantic memory
Visualization
Question answering
Task analysis
Performance improvement
Machine learning
Artificial intelligence
Feature extraction
Pattern recognition
Computer science
Natural language processing
model learning

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations