Korean-Optimized Word Representations for Out-of-Vocabulary Problems Caused by Misspelling Using Sub-character Information

2019 
In this paper, we propose Korean-optimized word representations that can better address the out-of-vocabulary (OOV) problem caused by misspelling. This problem is an important issue in many applications based on natural language processing. However, previous models do not fully consider the representations of misspelled OOV words. To overcome this problem, we propose sub-character information obtained from Korean Jamo units and also adopt additional sub-character information to better withstand the misspelling. Finally, experimental results show that our model is about 2.3 times more accurate than the conventional model in case of the misspelled word while still maintaining the semantic relationship of the words.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    17
    References
    0
    Citations
    NaN
    KQI
    []