A Chinese Text Similarity Calculation Algorithm Based on DF_LDA

Chao Zhang,Li Chen,Qiong Li

A Chinese Text Similarity Calculation Algorithm Based on DF_LDA

2016

Chao Zhang
Li Chen
Qiong Li

In order to reduce Chinese text similarity calculation complexity and improve text clustering accuracy, this paper proposes a new text similarity calculation algorithm based on DF_LDA. First, we use DF method to realize feature extraction; then, we use LDA method to construct text topic model; finally, we use DF_LDA model obtained to calculate text similarity. Due to considering the text semantic and word frequency information, the new method can improve text clustering precision. In addition, DF_LDA method reduces text feature vector dimensions twice; it can efficiently save text similarity calculating time, and increases text clustering speed. Our experiments on TanCorp-12-Txt and FuDanCorp datasets demonstrate that the proposed method can reduce modeling time efficiently, and improves text clustering accuracy effectively.

Keywords:

Word lists by frequency
Feature vector
Feature extraction
Topic model
Artificial intelligence
Document clustering
Machine learning
Algorithm
Pattern recognition
Computer science
calculation algorithm

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations