Question-Guided Semantic Dual-Graph Visual Reasoning with Novel Answers

2021 
Visual Question Answering (VQA) has gained increasing attention as being the cross-disciplinary research of computer vision and natural language understanding. However, recent advances mostly treated it as a closed-set classification problem, by limiting the possible outputs to some fixed frequent answers available in a training set. Although effective on benchmark datasets, this paradigm is inherently defective---the VQA model would always fail on a question whose correct answer is out of the answer set, which severely hampers its generalization and flexibility. To try to close the gap, we explore an open-set VQA setting, where models are evaluated using novel samples with unseen answers given dynamic candidate answers from some candidate-generation module. For experimental purposes, two oracle candidate-sampling strategies are proposed to serve as a proxy for the candidate-generation module and generate dynamic candidate answers for testing samples. The conventional classification-based paradigm is no longer applicable in our setting. To this end, we design a matching based VQA model, in which a novel Single-Source Graph Convolutional Network (SSGCN) module is designed to jointly leverage question guidance and dual semantic answer-graphs to produce more discriminative and relevant answer embeddings. Extensive experiments and ablation studies by re-purposing two benchmark datasets demonstrate the effectiveness of our proposed model.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    15
    References
    0
    Citations
    NaN
    KQI
    []