Sequence-Based Word Embeddings for Effective Text Classification

Bruno Guilherme Gomes,Fabricio Murai,Olga Goussevskaia,Ana Paula Couto da Silva

Sequence-Based Word Embeddings for Effective Text Classification

2021

Bruno Guilherme Gomes
Fabricio Murai
Olga Goussevskaia
Ana Paula Couto da Silva

In this work we present DiVe (Distance-based Vector Embedding), a new word embedding technique based on the Logistic Markov Embedding (LME). First, we generalize LME to consider different distance metrics and address existing scalability issues using negative sampling, thus making DiVe scalable for large datasets. In order to evaluate the quality of word embeddings produced by DiVe, we used them to train standard machine learning classifiers, with the goal of performing different Natural Language Processing (NLP) tasks. Our experiments demonstrated that DiVe is able to outperform existing (more complex) machine learning approaches, while preserving simplicity and scalability.

Keywords:

quality
Markov chain
Word embedding
Artificial intelligence
Sampling (statistics)
Computer science
Embedding
Sequence
Word (computer architecture)
Scalability
Machine learning

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations