Semantically Guided Visual Question Answering

2018 
We present a novel approach to enhance the challenging task of Visual Question Answering (VQA) by incorporating and enriching semantic knowledge in a VQA model. We first apply Multiple Instance Learning (MIL) to extract a richer visual representation addressing concepts beyond objects such as actions and colors. Motivated by the observation that semantically related answers often appear together in prediction, we further develop a new semantically-guided loss function for model learning which has the potential to drive weakly-scored but correct answers to the top while suppressing wrong answers. We show that these two ideas contribute to performance improvement in a complementary way. We demonstrate competitive results comparable to the state of the art on two VQA benchmark datasets.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    36
    References
    8
    Citations
    NaN
    KQI
    []