Korean-Optimized Word Representations for Out-of-Vocabulary Problems Caused by Misspelling Using Sub-character Information

Seonhghyun Kim,Jai Eun Kim,Seokhyun Hawang,Berlocher Ivan,Seung-Won Yang

Korean-Optimized Word Representations for Out-of-Vocabulary Problems Caused by Misspelling Using Sub-character Information

2019

Seonhghyun Kim
Jai Eun Kim
Seokhyun Hawang
Berlocher Ivan
Seung-Won Yang

In this paper, we propose Korean-optimized word representations that can better address the out-of-vocabulary (OOV) problem caused by misspelling. This problem is an important issue in many applications based on natural language processing. However, previous models do not fully consider the representations of misspelled OOV words. To overcome this problem, we propose sub-character information obtained from Korean Jamo units and also adopt additional sub-character information to better withstand the misspelling. Finally, experimental results show that our model is about 2.3 times more accurate than the conventional model in case of the misspelled word while still maintaining the semantic relationship of the words.

Keywords:

Word embedding
Natural language processing
Artificial intelligence
Vocabulary
Computer science
out of vocabulary
semantic relationship
word representation

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations