Comparative Study of Arabic Text Categorization Using Feature Selection Techniques and Four Classifier Models

2020 
Text classification is the process of assigning appropriate categories to free text according to its content. It is one of the important task in Text mining. Numerous studies have been conducted for natural languages processing using Japanese, French, Latin and Turkish documents, but the number of works related to the text written in Arabic language is still limited. In this paper we conduct a comparative study of three methods of feature selection using four well-known classifiers namely: Decision Tree, Naive Bayes, K-Nearest Neighbors and Support Vector Machine. A corpus contained 250 Arabic text belonging into five classes: sport, politics, economics, culture and art, and society. The data set is used to evaluate and compare the effectiveness of the obtained model. The experimental results reveal that using improved Chi-square method as feature selection and Support Vector Machine as classifier outperforms other combinations in terms of precision. This combination significantly improves the performance of Arabic text classification model. The highest value of precision measure for this model is 89.9%.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    14
    References
    1
    Citations
    NaN
    KQI
    []