A new similarity measure for automatic text categorization based on vector space model

Said Bahassine,Abdellah Madani,Mohamed Kissi

A new similarity measure for automatic text categorization based on vector space model

2017

Text classification is the process of assigning a predefine class or category to an anonymous text based on its content. It is an important task in text mining. Several Text classification algorithms were developed for natural languages, such as English, Chinese and Dutch. However, the number of related works for Arabic is limited. In this research, we will attempt to generalize the method to compute category representative vectorand propose a new similarity measure(referred to, hereafter, as origin-similarity) based on aVector Space Model to classify Arabic documents and compare proposed method with well-known similarity techniques.The measurement used a dataset that consists of 250 Arabictextsindependently classified into five classes: art and culture, economics, politics, society, and sport. The experimental findings show that Arabic text classification using VSMprovides the best results and could attribute the category of a text with an accuracy of 91 %.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations