Distant Supervision for Multi-Stage Fine-Tuning in Retrieval-Based Question Answering

Yuqing Xie,Wei Yang,Luchen Tan,Kun Xiong,Nicholas Jing Yuan,Baoxing Huai,Ming Li,Jimmy Lin

Distant Supervision for Multi-Stage Fine-Tuning in Retrieval-Based Question Answering

2020

We tackle the problem of question answering directly on a large document collection, combining simple “bag of words” passage retrieval with a BERT-based reader for extracting answer spans. In the context of this architecture, we present a data augmentation technique using distant supervision to automatically annotate paragraphs as either positive or negative examples to supplement existing training data, which are then used together to fine-tune BERT. We explore a number of details that are critical to achieving high accuracy in this setup: the proper sequencing of different datasets during fine-tuning, the balance between “difficult” vs. “easy” examples, and different approaches to gathering negative examples. Experimental results show that, with the appropriate settings, we can achieve large gains in effectiveness on two English and two Chinese QA datasets. We are able to achieve results at or near the state of the art without any modeling advances, which once again affirms the cliche “there’s no data like more data”.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations