Exploring Features for Named Entity Recognition in Lithuanian Text Corpus

Jurgita Kapociute-Dzikiene,Anders Nøklestad,Janne Bondi Johannessen,Algis Krupaviċius

Exploring Features for Named Entity Recognition in Lithuanian Text Corpus

2013

Despite the existence of effective methods that solve named entity recognition tasks for such widely used languages as English, there is no clear answer which methods are the most suitable for languages that are substantially different. In this paper we attempt to solve a named entity recognition task for Lithuanian, using a supervised machine learning approach and exploring different sets of features in terms of orthographic and grammatical information, different windows, etc. Although the performance is significantly higher when language dependent features based on gazetteer lookup and automatic grammatical tools (part-of-speech tagger, lemmatizer or stemmer) are taken into account; we demonstrate that the performance does not degrade when features based on grammatical tools are replaced with affix information only. The best results (micro-averaged F-score=0.895) were obtained using all available features, but the results decreased by only 0.002 when features based on grammatical tools were omitted.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations