A Study on Text Classification: Term Weighting Algorithm Analysis

2021 
With the advancement of digital recording and storing technology, plus the huge growth of world wide web, people nowadays use digital texts instead of paper to write and record. In order to realize more text applications, the technology of text classification is gradually gaining attention recently. To achieve automatic text classification through machine learning, the related five technologies, including pre-processing, feature extraction, feature selection, term weighting and classification algorithm, are often discussed as well by many researches. In this paper, we are going to explore the impact of term weighting on text classification. Term weighting is definitely a very important part of text classification. The calculated weight should directly reflect the importance of the term in entire text to allow machine learning to achieve the best classified result. We applied some common term weighting methods to several pre-defined datasets and conducted the experiments. Instead of intuitively considering that the value of weight represents how important it is, it turned out that the result shows the term actually may not as important as the high scored weight represents.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []