Linguistically Motivated and Ontological Features for Vietnamese Named Entity Recognition

2012 
In this paper, we provide a deep analysis on the effect of linguistic features and ontological features for the Vietnamese named entity recognition (NER) task. Plugged in into an off-the-shelf learning framework, we show that, simple lexical words and bi-gram features allow to encode dependencies amongst possible NE labels in Vietnamese language. Results achieved on a standard annotated corpus support our claim, with an accuracy comparable to the state-of-the-art without any external resource. Moreover, when augmented with ontological features from a large knowledge base, the results in both flat and structured classification are almost competitive. Our finding exhibits interesting aspects of linguistically motivated features, including contextual and syntactic patterns for Vietnamese language. Additionally, results achieved with ontological features show that, they can be used to learn as specific as needed, resulting in the first high-performance Vietnamese structured NER system.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    21
    References
    3
    Citations
    NaN
    KQI
    []