SberQuAD -- Russian Reading Comprehension Dataset: Description and Analysis

Pavel Efimov,Leonid Boytsov,Pavel Braslavski

SberQuAD -- Russian Reading Comprehension Dataset: Description and Analysis

2019

Pavel Efimov
Leonid Boytsov
Pavel Braslavski

The paper presents SberQuAD – a large Russian reading comprehension (RC) dataset created similarly to English SQuAD. SberQuAD contains about 50K question-paragraph-answer triples and is seven times larger compared to the next competitor. We provide its description, thorough analysis, and baseline experimental results. We scrutinized various aspects of the dataset that can have impact on the task performance: question/paragraph similarity, misspellings in questions, answer structure, and question types. We applied five popular RC models to SberQuAD and analyzed their performance. We believe our work makes an important contribution to research in multilingual question answering.

Keywords:

Reading comprehension
Artificial intelligence
Natural language processing
Question answering
Paragraph
Computer science

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations