Arabic Named Entity Recognition: An SVM-based approach

2008 
The Named Entity Recognition (NER) task has been garnering significant attention as it has been shown to help improve the performance of many Natural Language Processing (NLP) applications. More recently, we are starting to see a surge in developing NER systems for languages other than English. With the relative abundance of resources for the Arabic language and a certain degree of maturation in the state of the art for processing Arabic, it is natural to see interest in developing NER systems for the language. In this paper, we investigate the impact of using different sets of features that are both language independent and language specific in a discriminative machine learning framework, namely, Support Vector Machines. We explore lexical, contextual and morphological features and nine data-sets of different genres and annotations. We systematically measure the impact of the different features in isolation and combined. We achieve the highest performance using a combination of all features. Combining all the features, our system yields an F1=82.71. Essentially combining language independent features with language specific ones yields the best performance on all the genres of text we investigate.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    25
    References
    44
    Citations
    NaN
    KQI
    []