Using Embedding-Based Similarities to Improve Lexical Resources

2021 
In this paper we discuss the usefulness of applying semi-automatic checking procedures to existing thesauri for natural language processing—large manually-created lexical-semantic resources. The procedure is based on computation of word vector representations and word semantic similarities on large text collections. The first procedure analyses discrepancies between corpus-based and thesaurus-based word similarities. The second procedure compares the hypernyms (more general words) described in a resource and predicted ones from the relevant collection. We applied the procedures to verification of Russian wordnet RuWordNet. Both procedures helped to find some significant mistakes or inconsistencies in word sense description in RuWordNet, which were difficult to reveal in the resource due to its large volume. The proposed procedures also demonstrate the possibility of fast adaptation of an existing semantic resource to a new domain.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    51
    References
    0
    Citations
    NaN
    KQI
    []