Thesaurus-Based Methods for Assessment of Text Complexity in Russian

2020 
The study explores the problem of assessing complexity of Russian educational texts. In this paper, we focus on measuring conceptual complexity which is rarely selected as a research question and propose to use a thesaurus (or a linguistic ontology) to this end. We also compiled an original corpus of school textbooks on Social Studies, History used in high school, and textbooks for elementary school specifically for this set of text complexity experiments. On the first stage of the research, RuThes-Lite thesaurus, a linguistic knowledge base with the total size of 100,000 concepts, was used to elicit concepts in the texts of schoolbooks and represent them as graphs. To the best of our knowledge, we a new method for text complexity assessment using RuThes-Lite graphs and identify graphs-based semantic characteristics of texts that impact complexity. The most significant findings of the research include identification of statistically significant correlations of the selected features, such as node degree, with complexity of educational texts.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    19
    References
    0
    Citations
    NaN
    KQI
    []