Development of Bangla Spell and Grammar Checkers: Resource Creation and Evaluation

2021 
A spell and grammar checker is profoundly essential for diverse publications especially for Bangla language in particular as it is spoken by millions of native speakers around the world. Considering the lack of research efforts, we demonstrate the development of a comprehensive Bangla spell and grammar checker with necessary resources. At first, a full-fledged and generalised Bangla monolingual corpus comprising over 100 million words has been built by scraping reputed, diversified online sources and then an extensive Bangla lexicon consisting of over 1 million unique words has been extracted from that corpus. Based on these corpus and lexicon, we have developed a combined spell and grammar checker application that simultaneously detects distinct spelling and grammatical mistakes and provides appropriate suggestions for both as well. The spell checker uses the Double Metaphone algorithm and Edit distance based on the distributed lexicons and numerical suffix dataset to detect all types of Bangla spelling mistakes with an accuracy rate of 97.21% individually. The grammar checker detects errors based on language model probability i.e. combination of bigram and trigram, and generates suggestions based on the Cosine similarity measure with the accuracy rate of 94.29% individually. The datasets and codes used in this work are freely available at https://git.io/JzJ4w .
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    33
    References
    0
    Citations
    NaN
    KQI
    []