Predicting Word Embeddings Variability.

Bénédicte Pierrejean,Ludovic Tanguy

Predicting Word Embeddings Variability.

2018

Bénédicte Pierrejean
Ludovic Tanguy

Neural word embeddings models (such as those built with word2vec) are known to have stability problems: when retraining a model with the exact same hyperparameters, words neighborhoods may change. We propose a method to estimate such variation, based on the overlap of neighbors of a given word in two models trained with identical hyperparameters. We show that this inherent variation is not negligible, and that it does not affect every word in the same way. We examine the influence of several features that are intrinsic to a word, corpus or embedding model and provide a methodology that can predict the variability (and as such, reliability) of a word representation in a semantic vector space.

Keywords:

Natural language processing
Machine learning
Artificial intelligence
Computer science

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations