How to Segment Turkish Words for Neural Text Classification

2020 
Neural text classifiers of agglutinative languages often suffer from large vocabulary sizes of training data and high out of vocabulary rates during the test time. The natural language processing community has developed and used numerous word segmentation procedures to alleviate these problems. However, their effect on the performance of neural classifiers of Turkish documents requires further investigation. In this empirical study, we carry out an extensive series of experiments to investigate the effect of the choice of word segmentation procedure on the performance of three different neural text classifiers on Turkish documents across multiple domains. Our experiments show that the choice of word segmentation procedure is another hyperparameter that needs tuning. This choice may depend on the domain and the neural architecture.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    20
    References
    0
    Citations
    NaN
    KQI
    []