Detecting nonsense for Chinese comments based on logistic regression

Ren Zhuolin,Chen Guang,Chen Shu

Detecting nonsense for Chinese comments based on logistic regression

2016

Ren Zhuolin
Chen Guang
Chen Shu

To understand cyber citizens’ opinion accurately from Chinese news comments, the clear definition on nonsense is present, and a detection model based on logistic regression (LR) is proposed. The detection of nonsense can be treated as a binary-classification problem. Besides of traditional lexical features, we propose three kinds of features in terms of emotion, structure and relevance. By these features, we train an LR model and demonstrate its effect in understanding Chinese news comments. We find that each of proposed features can significantly promote the result. In our experiments, we achieve a prediction accuracy of 84.3% which improves the baseline 77.3% by 7%.

Keywords:

Logistic regression
Nonsense
tf–idf
Data mining
Search engine
Feature extraction
Engineering
Natural language processing
Artificial intelligence

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations