A collocation extraction tool for Romanian

2015 
Background. Lexical knowledge, and in particular knowledge on multi-word expressions, is at the cornerstone of language applications such as syntactic parsing or machine translation. Corpus-driven lexical acquisition is one of the major means to create such knowledge, in order to build or consolidate dictionaries and similar types of lexical resources. We describe ongoing work devoted to the corpus-based extraction of multi-word expressions – in particular, collocations – for the Romanian language. Romanian is since 2002 one of the 23 official languages of the European Union; it is the native language or around 24 million people, and is currently ranked 8 th in the list of most spoken European languages worldwide, after Spanish (405 million native speakers), English (360), Portuguese (215), German (89), French (74), Italian (59), and Polish (40) 1 . This high rank contrasts, however, with the relatively scarce development of language resources and tools compared to other languages.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    4
    References
    0
    Citations
    NaN
    KQI
    []