On a diacritics due part-of-speech tagging ambiguity in Romanian
2017
In this paper we investigate lack-of-diacritics due ambiguity when doing Part-of-Speech tagging in Romanian. This means that some words, if they are written with no diacritcs, could have associated more than one Part-of-Speech tag, even if this is not the case when the diacritcs are employed. A method for dealing with this problem is proposed. By developing an existing Hidden Markov Model and employing a large dictionary and trigrams set, the solution allows doing Part-of-Speech tagging for Romanian even if the words are written with no diacritics. The performance is evaluated on a set of sentences manually built to illustrate the situation.
Keywords:
- Correction
- Source
- Cite
- Save
- Machine Reading By IdeaReader
7
References
1
Citations
NaN
KQI