The Optimization of Portuguese Named-Entity Recognition and Classification by Combining Local Grammars and Conditional Random Fields Trained with a Parsed Corpus

2021 
This article presents the results of a study concerning named-entity recognition and classification for Portuguese focusing on temporal expressions. We have used the Conditional Random Fields (CRF) probabilistic method and features coming from an automatically annotated parsed corpus and local grammars. We were able to notice that Part-of-Speech (PoS) tags are the most relevant information coming from a parsed corpus to be used as a feature for this task. No positive synergy emerges from the association of these tags with other linguistic information from the parsed corpus. A NooJ local grammar, created to recognize “Time” category entities (without detailing types and subtypes), provides information that surpasses PoS tags as a feature for CRF training in terms of precision and recall. The combination of PoS and NooJ annotations does not bring any advantage.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    11
    References
    0
    Citations
    NaN
    KQI
    []