Using Embedding Models for Lexical Categorization in Morphologically Rich Languages

Borbála Siklósi

Using Embedding Models for Lexical Categorization in Morphologically Rich Languages

2016

Borbála Siklósi

Neural-network-based semantic embedding models are relatively new but popular tools in the field of natural language processing. It has been shown that continuous embedding vectors assigned to words provide an adequate representation of their meaning in the case of English. However, morphologically rich languages have not yet been the subject of experiments with these embedding models. In this paper, we investigate the performance of embedding models for Hungarian, trained on corpora with different levels of preprocessing. The models are evaluated on various lexical categorization tasks. They are used for enriching the lexical database of a morphological analyzer with semantic features automatically extracted from the corpora.

Keywords:

Computer science
Natural language processing
Artificial intelligence
Lexical database
Embedding
Categorization
Continuous embedding
Preprocessor
Information retrieval

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations