HSCKE: A Hybrid Supervised Method for Chinese Keywords Extraction

2020 
Automatic keywords extraction refers to extracting words or phrases from a single text or text collection. Supervised methods outperform unsupervised methods, but it requires a large volume of labeled corpus for training. To address the problem, extra knowledge is obtained through labels generated by other tools. Moreover, the preprocessing of Chinese text is more challenging than that in English because of the fragments caused by word segment. Hence the named entity recognition in the preprocessing is introduced to enhance the accuracy. On the other hand, text contains different separate parts, and each part conveys information to readers on different levels. Thus, we present a text weighting method based on priority that takes into consideration the importance of different texture parts. In this paper, we integrate the three ideas above and propose a novel hybrid method for Chinese keywords extraction (HSCKE). To evaluate the performance of our proposed approach, we compare HSCKE with four most commonly used methods on two typical Chinese keywords extraction datasets. The experimental results show that the proposed approach achieves the optimal performance in terms of precision, recall and F1 score.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    8
    References
    0
    Citations
    NaN
    KQI
    []