Advances in Pre-Training Distributed Word Representations

Tomas Mikolov,Edouard Grave,Piotr Bojanowski,Christian Puhrsch,Armand Joulin

Advances in Pre-Training Distributed Word Representations

2017

Tomas Mikolov
Edouard Grave
Piotr Bojanowski
Christian Puhrsch
Armand Joulin

Many Natural Language Processing applications nowadays rely on pre-trained word representations estimated from large text corpora such as news collections, Wikipedia and Web Crawl. In this paper, we show how to train high-quality word vector representations by using a combination of known tricks that are however rarely used together. The main result of our work is the new set of publicly available pre-trained models that outperform the current state of the art by a large margin on a number of tasks.

Keywords:

Natural language processing
Text corpus
Artificial intelligence
Computer science
Web crawler

Correction
Cite
Save
Machine Reading By IdeaReader

References

Citations