$f_{BGD}$: Learning Embeddings From Positive-Only Data with BGD

2018 
Learning embeddings from sparse positive data is a fundamental task for problems of several domains, such as natural language processing (NLP), computer vision (CV), and information retrieval (IR). By far, the most widely used optimization methods rely on stochastic gradient descent (SGD) with negative sampling (NS), particularly for learning from large-scale data. However, the convergence and effectiveness of SGD depend largely on the sampling distribution of negative examples. Moreover, SGD suffers from dramatic fluctuation due to its one-sample learning scheme. To address the above common issues of existing embedding methods, we present a generic batch gradient descent optimizer ($f_{BGD}$) to learn embeddings from \emph{all} training examples without sampling. Our main contribution is that we accelerate $f_{BGD}$ by several magnitudes, making its time complexity the same level as the NS-based SGD. We evaluate $f_{BGD}$ on three well-known tasks across domains, namely, word embedding (NLP), image classification (CV), and item recommendation (IR). Experiments show that $f_{BGD}$ significantly outperforms NS-based SGD models on all three tasks with comparable efficiency. Codes will be made available.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []