A Novel Text Feature Weight Calculation Method Applied to Power Field

2016 
Feature extraction is the important prerequisite of classifying text effectively and automatically. The TF-IDF algorithm is widely used to express the text feature weight. But it can't reflect the dispersion information of category, and then can't reflect the difference between categories. TF-IDF works poorly in the power field, because the focus point and expression of news texts vary a great deal in different sub-fields. Accordingly, the paper proposes a novel algorithm for text feature weight calculation applied to power field, called TF-DFDP algorithm. The TF-DFDP algorithm introduces FC (Frequency in Category), DC (Dispersion in Category), PS (Paragraph Span Factor) and CW (Category Weight Factor). Experimental results demonstrate the new algorithm performance with respect to higher precision, elevated recall and better F1 value.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    9
    References
    0
    Citations
    NaN
    KQI
    []