Arabic Light Stemming: A Comparative Study between P-Stemmer, Khoja Stemmer, and Light10 Stemmer
2019
Arabic is a derived language that has a deep structure and words meaning, one of the Arabic challenges is its morphology dependency. Arabic Natural Language Processing (ANLP) tools are required to achieve many tasks, such as Machine learning. For the text classification task, the ANLP is considered as preprocessing steps. These preprocessing steps include but not limited to Stemming, Normalization, and Stop-words Removal. In this work, we collected 2,000 news articles from Arabic online newspapers, the data were classified using Support Vector Machine (SVM) and Nave Base (NB) classifiers. The classification task was conducted for the purpose of comparing three different Arabic light stemmers; P-Stemmer, Khoja Stemmer, and Light10 Stemmer. The P-Stemmer results was dominating the other two stemmers in both SVM and NB classifiers with accuracy of 0.92 for F1-measure in SVM classifier and 0.90 for F1-Measure in NB classifier.
Keywords:
- Correction
- Source
- Cite
- Save
- Machine Reading By IdeaReader
27
References
3
Citations
NaN
KQI