Research on Text Classification Method Based on Word2vec and Improved TF-IDF

Tao Zhang,Luyao Wang

Research on Text Classification Method Based on Word2vec and Improved TF-IDF

2020

Tao Zhang
Luyao Wang

TF-IDF is widely used as the most common feature weight calculation method. The traditional TF-IDF feature extraction method lacks the representation of the distribution difference between classes in the text classification task and the feature matrix generated by the TF-IDF is huge and sparse. Based on this situation, this paper proposes a method of using the feature extraction algorithm of chi-square statistics to compensate for the distribution difference between classes and generating a fixed-dimensional real matrix through word2vec. The experimental results show that the new method is significantly better than the traditional feature extraction methods in the evaluation results such as precision, recall, F1 and ROC_AUC.

Keywords:

tf–idf
Artificial intelligence
Word2vec
Computer science
Natural language processing
feature extraction algorithm
classification methods
Matrix (mathematics)
Feature extraction
feature matrix
Chi-square test
Pattern recognition

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations