Semantics-assisted Wasserstein Learning for Topic and Word Embeddings
2020
Wasserstein distance, defined as the cost (measured by word embeddings) of optimal transport plan for moving between two histograms, has been proven effective in tasks of natural language processing. In this paper, we extend Nonnegative Matrix Factorization (NMF) to a novel Wasserstein topic model, namely Semantics-Assisted Wasserstein Learning (SAWL), with simultaneous learning of topics and word embeddings. In Sawl, we formulate an NMF-like unified objective that integrates the regularized Wasserstein distance loss with a context factorization of word context information. Therefore, Sawl can refine the word embeddings for capturing corpus-specific semantics, enabling to boost topics and word embeddings each other. We analyze Sawl, and provide its dimensionality-dependent generalization bounds of reconstruction errors. Experimental results indicate that Sawl outperforms the state-of-the-art baseline models.
Keywords:
- Correction
- Source
- Cite
- Save
- Machine Reading By IdeaReader
56
References
2
Citations
NaN
KQI