A Method for Thematic Term Extraction Base on Word Position Weight

2012 
Thematic terms can well represent the main idea of documents. The research on thematic term extraction is one of important fields of Natural Language Processing. This paper proposes a novel thematic term extraction method, which consists of the generation of candidate thematic term set based on the position weight of terms and the extraction of thematic term based on incremental weight of thematic term set. The generation algorithm gives a weight to a term according to its positions in a document, and then generates the candidate thematic term set according to their weights. The extraction algorithm calculates the incremental weight of each candidate term, and selects the terms whose incremental weights are larger than a given threshold. The experiment results on two corpuses show that the overall satisfaction of thematic term extraction of our method is beyond 90%, achieving very good performance.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    8
    References
    0
    Citations
    NaN
    KQI
    []