PRETO: a high-performance text mining tool for preprocessing Turkish texts
2012
Text documents are usually unstructured and written in natural language. To apply conventional data mining techniques on text documents, a preprocessing operation is indispensable. In this paper, we introduce PRETO, a cross-platform, powerful and scalable preprocessing tool developed specifically for preprocessing Turkish texts, with a wide range of preprocessing options like stemming, stopword filtering, statistical term filtering, and n-gram generation. We demonstrate the performance and scalability of PRETO with some experiments on large document collections.
Keywords:
- Correction
- Source
- Cite
- Save
- Machine Reading By IdeaReader
9
References
16
Citations
NaN
KQI