Sidecar: Augmenting Word Embedding Models with Expert Knowledge

Mathieu Lemay,Daniel Shapiro,Mary Kate MacPherson,Kieran Yee,Hamza Qassoud,Miodrag Bolic

Sidecar: Augmenting Word Embedding Models with Expert Knowledge

2020

Mathieu Lemay
Daniel Shapiro
Mary Kate MacPherson
Kieran Yee
Hamza Qassoud
Miodrag Bolic

This work investigates a method for enriching pre-trained word embeddings with domain-specific information using a small, custom word embedding. For a classification task on text containing out-of-vocabulary expert jargon, this new approach improves the prediction accuracy when using popular models such as Word2Vec (71.5% to 76.6%), GloVe (73.5% to 77.2%), and fastText (75.8% to 79.6%). Furthermore, an analysis of the approach demonstrates that expert knowledge is improved in terms of discrimination and inconsistency. Another advantage of this word embedding augmentation technique is that it is computationally inexpensive and leverages the general syntactic information encoded in large pre-trained word embeddings.

Keywords:

Artificial intelligence
Computer science
Word embedding
Natural language processing
Syntax
Transfer of learning
Word2vec
Jargon

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations