Performance Comparison of Spoken Language Detection Models with Embedding Replacement

2021 
Deep learning-based abuse detection models have many limitations in improving detection accuracy due to frequent typos and spacing errors in Korean text. Particularly, in the process of morphological analysis of spoken language for generating learning data, there is a problem that unnecessary morphemes are frequently extracted, making it difficult to grasp the meaning of words. This is the biggest cause of degrading the accuracy of the abuse detection model. In this paper, to overcome this problem of Korean spoken language, we design and implement detection models based on embedding, and compare the accuracy of abuse detection. We use three embedding models: fastText, SKT-KoBERT, and KoELECTRA for detection; we then compare and evaluate the performance of each embedding-based abuse detection model through various experiments. As a result of the experiments, the comparison of the ambiguity determination, SKT-KoBERT showed significantly higher performance than fastText. The comparison according to the pre-training method also showed higher performance of SKT-KoBERT than KoELECTRA. Through the experimental results, we believe that more effective embedding technology can be applied to various spoken language-based deep learning services.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    5
    References
    0
    Citations
    NaN
    KQI
    []