A Chinese Text Similarity Calculation Algorithm Based on DF_LDA

2016 
In order to reduce Chinese text similarity calculation complexity and improve text clustering accuracy, this paper proposes a new text similarity calculation algorithm based on DF_LDA. First, we use DF method to realize feature extraction; then, we use LDA method to construct text topic model; finally, we use DF_LDA model obtained to calculate text similarity. Due to considering the text semantic and word frequency information, the new method can improve text clustering precision. In addition, DF_LDA method reduces text feature vector dimensions twice; it can efficiently save text similarity calculating time, and increases text clustering speed. Our experiments on TanCorp-12-Txt and FuDanCorp datasets demonstrate that the proposed method can reduce modeling time efficiently, and improves text clustering accuracy effectively.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    11
    References
    1
    Citations
    NaN
    KQI
    []