Textbook Question Answering with Multi-type Question Learning and Contextualized Diagram Representation

2021 
Textbook question answering (TQA) is a multi-modal task that requires complex parsing and reasoning over scientific diagrams and long text to answer various types of questions, including true/false questions, reading comprehension, and diagram questions, making TQA a superset of question answering (QA) and visual question answering (VQA). In this paper, we introduce a Multi-Head TQA architecture (MHTQA) for solving the TQA task. To overcome the long text issue, we apply the open-source search engine Solr to select sentences from lesson essays. In order to answer questions that have different input formats and share knowledge, we build a bottom-shared model with a transformer and three QA networks. For diagram questions, previous approaches did not incorporate the textual context to produce diagram representation, resulting in insufficient utilize of diagram semantic information. To address this issue, we learn a contextualized diagram representation through the novel Contextualized Iterative Dual Fusion network (CIDF) using the visual and semantic features of the diagram image and the lesson essays. We jointly train different types of questions in a multi-task learning manner for knowledge sharing by an efficient sampling strategy of Multi-type Question Learning (MQL). The experimental results show that our model outperforms the existing single model on all question types by a margin of 4.6%, 1.7%, 1%, 1.9% accuracy on Text T/F, Text MC, Diagram, and overall accuracy.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    21
    References
    0
    Citations
    NaN
    KQI
    []