Deep Learning-Based Context-Sensitive Spelling Typing Error Correction

2020 
This study aims to solve the context-sensitive spelling error problem for English documents. There are two types of spelling errors in English: non-word spelling errors and context-sensitive spelling errors. Non-word spelling errors are simple to correct because they can only be detected by matching the words in sentences with those in a dictionary; however, context-sensitive spelling errors entail increased difficulty of correction because the relationship between the word to be corrected and the surrounding context must be known. Spelling errors are considered noise in every field that uses text information, and preprocessing via document correction is necessary to minimize this problem. Context-sensitive spelling errors include homophone errors (which arise from the incorrect use of words that sound the same but are spelled differently), typographical errors (caused by striking an incorrect key on a keyboard), grammatical errors (which occur when the user does not know the correct grammatical rules), and cross word boundary errors (which arise from incorrect spacing between words). This study focuses on typographical errors. The context-sensitive spelling error problem is solved using the deep learning method, which is not an existing statistical method. The deep learning language model-based correction approach is divided into four parts, namely, correction based on word embedding information, contextual embedding information, an auto-regressive (AR) language model, and an auto-encoding (AE) language model. In this study, the best correction performance was obtained for the AE language model-based approach, and we verified its performance through a detailed correction test.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    4
    Citations
    NaN
    KQI
    []