Vidas Daudaravičius Centre of Computational linguistics

Andrius Utka

Vidas Daudaravičius Centre of Computational linguistics

2007

Andrius Utka

As the development of information technologies makes progress, large morphologically annotated corpora become a necessity, as they are necessary for moving onto higher levels of language computerisation (e. g. automatic syntactic and semantic analysis, information extraction, machine translation). Research of morphological disambiguation and morphological annotation of the 100 million word Lithuanian corpus are presented in the article. Statistical methods have enabled to develop the automatic tool of morphological annotation for Lithuanian, with the disambiguation precision of 94%. Statistical data about the distribution of parts of speech, most frequent wordforms, and lemmas, in the annotated Corpus of The Contemporary Lithuanian Language is also presented.

Keywords:

Lithuanian
Machine translation
Information technology
Syntax
Annotation
Natural language processing
Part of speech
Computational linguistics
Information extraction
Artificial intelligence
Computer science

Correction
Cite
Save
Machine Reading By IdeaReader

References

Citations