How to Segment Turkish Words for Neural Text Classification

Abdullah Al Nahas,Aysenur Kulunk,Burak Gozutok,Soner Can Kalkan,Hakki Yagiz Erdinc

How to Segment Turkish Words for Neural Text Classification

2020

Abdullah Al Nahas
Aysenur Kulunk
Burak Gozutok
Soner Can Kalkan
Hakki Yagiz Erdinc

Neural text classifiers of agglutinative languages often suffer from large vocabulary sizes of training data and high out of vocabulary rates during the test time. The natural language processing community has developed and used numerous word segmentation procedures to alleviate these problems. However, their effect on the performance of neural classifiers of Turkish documents requires further investigation. In this empirical study, we carry out an extensive series of experiments to investigate the effect of the choice of word segmentation procedure on the performance of three different neural text classifiers on Turkish documents across multiple domains. Our experiments show that the choice of word segmentation procedure is another hyperparameter that needs tuning. This choice may depend on the domain and the neural architecture.

Keywords:

Artificial intelligence
Natural language processing
Psychology
Turkish

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations