Document classification of SuDer Turkish news corpora
2018
Word embeddings are successfully employed in various Natural Language Processing tasks, but training them requires large amount of text, which is scarce for Turkish. In this work, we collected large amounts of articles from two news websites and tags within web pages are used as labels. Obtained corpora are tested with various document classification models. Embedding based models performed better than models with the traditional TF-IDF features. A neural network that simultaneously learns the word embeddings and document classification performed the best.
Keywords:
- Correction
- Source
- Cite
- Save
- Machine Reading By IdeaReader
19
References
1
Citations
NaN
KQI