Better Word Representations with Word Weight

2019 
As a fundamental task of natural language processing, text classification has been widely used in various applications such as sentiment analysis and spam detection. In recent years, the continuous-valued word embedding learned by neural network attaches extensive attentions. Although word embedding achieves impressive results in capturing similarities and regularities between words, it fails to highlight important words for identifying text category. Such deficiency could be attenuated by word weight, which conveys word contribution in text categorization. Toward this end, we propose an effective text classification scheme by incorporating word weight into word embedding in this paper. Specifically, in order to enrich word representation, the bidirectional gated recurrent units (Bi-GRU) is first employed to grasp context information of words. Then the word weights yielded by term frequency (TF) are used to modulate the word representation of Bi-GRU for constructing text representation. Extensive experimental results on several large text datasets verify that the accuracy of our proposed text classification scheme outperforms the state-of-the-art ones.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    30
    References
    0
    Citations
    NaN
    KQI
    []