Document classification of SuDer Turkish news corpora

Mehmet Umut Şen,Berrin A. Yanikoglu

Document classification of SuDer Turkish news corpora

2018

Mehmet Umut Şen
Berrin A. Yanikoglu

Word embeddings are successfully employed in various Natural Language Processing tasks, but training them requires large amount of text, which is scarce for Turkish. In this work, we collected large amounts of articles from two news websites and tags within web pages are used as labels. Obtained corpora are tested with various document classification models. Embedding based models performed better than models with the traditional TF-IDF features. A neural network that simultaneously learns the word embeddings and document classification performed the best.

Keywords:

Information retrieval
Artificial neural network
Computer science
Pattern recognition
Artificial intelligence
Web page
Document classification
Turkish
Task analysis
Embedding
Natural language processing

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations