Morphological Disambiguation and Text Normalization for Southern
2014
We built a pipeline to normalize Quechua texts through morphological analysis and disambiguation. Word forms are analyzed by a set of cascaded finite state transducers which split the words and rewrite the morphemes to a normalized form. However, some of these morphemes, or rather morpheme combinations, are ambiguous, which may affect the normalization. For this reason, we disambiguate the morpheme sequences with conditional random fields. Once we know the individual morphemes of a word, we can generate the normalized word form from the disambiguated morphemes. 1
Keywords:
- Correction
- Source
- Cite
- Save
- Machine Reading By IdeaReader
5
References
0
Citations
NaN
KQI