An account of the challenge of tagging a reference corpus for Brazilian Portuguese
2003
This article identifies and addresses the major linguistic/conceptual, as opposed to logistic, issues faced in the morphosyntactic tagging of MAC-Morpho, a 1.1 million word Brazilian Portuguese corpus of newspaper articles that has been developed in the Lacio-Web Project. Rather than simply presenting the annotated corpus and describing its tagset, we elaborate on the criteria for establishing the tagset and analyze some interesting cases amongst the linguistic problems we faced in this work.
Keywords:
- Correction
- Source
- Cite
- Save
- Machine Reading By IdeaReader
5
References
40
Citations
NaN
KQI