Chinese Document Keyword Extraction Algorithm Based on FP-growth

2016 
In view of the problems of the existing keyword extraction algorithm, such as large amount of computation and complex calculation process, this paper proposes an algorithm based on FP-Growth to extract keyword from Chinese documents. The FP-Growth algorithm mines word co-occurrence information, excluding the interference of noise words; semantic similarity computation using lexical chain eliminates the influence of synonyms; using TF-IDF and feature fusion method, considering frequency, part of speech and the position of the words, combine TF-IDF with "double comparing method" to calculate the weight of the characteristic factors, and build words weight function to calculate final weight of the candidate words. Experimental results show that the proposed method improves the accuracy rate and recall rate by about 10% compared to the traditional TF-IDF.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    8
    References
    1
    Citations
    NaN
    KQI
    []