PRETO: a high-performance text mining tool for preprocessing Turkish texts

Volkan Tunali,Turgay Tugay Bilgin

PRETO: a high-performance text mining tool for preprocessing Turkish texts

2012

Volkan Tunali
Turgay Tugay Bilgin

Text documents are usually unstructured and written in natural language. To apply conventional data mining techniques on text documents, a preprocessing operation is indispensable. In this paper, we introduce PRETO, a cross-platform, powerful and scalable preprocessing tool developed specifically for preprocessing Turkish texts, with a wide range of preprocessing options like stemming, stopword filtering, statistical term filtering, and n-gram generation. We demonstrate the performance and scalability of PRETO with some experiments on large document collections.

Keywords:

Data mining
Natural language processing
Natural language
Scalability
Turkish
Preprocessor
Artificial intelligence
Computer science
Text mining
Filter (signal processing)

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations