On a diacritics due part-of-speech tagging ambiguity in Romanian

2017 
In this paper we investigate lack-of-diacritics due ambiguity when doing Part-of-Speech tagging in Romanian. This means that some words, if they are written with no diacritcs, could have associated more than one Part-of-Speech tag, even if this is not the case when the diacritcs are employed. A method for dealing with this problem is proposed. By developing an existing Hidden Markov Model and employing a large dictionary and trigrams set, the solution allows doing Part-of-Speech tagging for Romanian even if the words are written with no diacritics. The performance is evaluated on a set of sentences manually built to illustrate the situation.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    7
    References
    1
    Citations
    NaN
    KQI
    []