Thai Word Segmentation Based on Sequence-to-Sequence Model

2021 
Thai as a low-resource language has a large word segmentation performance improvement space. In this paper, we investigate a sequence-to-sequence model for Thai word segmentation with two different recurrent neural networks, which could transform one input sequence into another output sequence. Furthermore, we evaluate datasets in four different fields compared then with other multiple word segmentation models, and the F1 value in the encyclopedia dataset reaches 97.15%. The results show that the proposed model has superior performance and is more effective, it is worth mentioning that the expected results can be achieved even with limited data resources.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    16
    References
    0
    Citations
    NaN
    KQI
    []