Social Media Text Classification Method Based on Character-Word Feature Self-attention Learning

2020 
Long tail effect and excessive out-of-vocabulary(OOV)words in social media texts result in severe feature sparsity and reduce classification accuracy.To solve the problem,a social media text classification method based on character-word feature self-attention learning is proposed.Global features are constructed at the character level to learn attention weight distribution,and the existing multi-head attention mechanism is improved to reduce parameter scale and computational complexity.To further analyze character-word feature fusion,OOV sensitivity is proposed to measure the impact of OOV words on different types of features.Experiments on several social media text classification tasks indicate that the effectiveness and classification accuracy of the proposed method are obviously improved in terms of fusing word features and character features.Moreover,the quantitative results of OOV vocabulary sensitivity index verify the feasiblity and effectiveness of the proposed method.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []