VQA as a factoid question answering problem: A novel approach for knowledge-aware and explainable visual question answering

2021 
Abstract With recent advancements in machine perception and scene understanding, Visual Question Answering (VQA) has garnered much attraction from researchers in the direction of training neural models for jointly analyzing, grounding and reasoning over the multi-modal space of image visual context and natural language in order to answer natural language questions pertaining to the image contents. However, though recent works have achieved significant improvement over state-of-art models for answering questions that are answerable by solely referring to the visual context of the image, such models are often limited, being incapable of tackling questions involving external world knowledge beyond the visible contents. Though recently, research has been driven towards tackling external knowledge based VQA as well, there is significant room for improvement as limited studies exist in this area. Inspired by the aforementioned challenges involved, this paper is aimed at answering free form and open ended natural language questions, not limited to visual context of an image, but external world knowledge as well. With this motive, inspired by human cognitive abilities of comprehending and reasoning answers when given a set of facts, this paper proposes a novel model architecture to model VQA as a factoid question answering problem, leveraging state-of-the-art deep learning techniques for reasoning and inferring answers to free form questions, in an attempt of improving the state-of-art in open ended visual question answering.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    43
    References
    0
    Citations
    NaN
    KQI
    []