A Self-Pruning Classification Model for News

2019 
News aggregators are on-line services that collect articles from numerous reputable media and news providers and reorganize them in a convenient manner with the aim of assisting their users to access the information they seek. One of the most important tools offered by news aggregators is based on the classification of the articles into a fixed set of categories. In this article, we introduce a supervised classification method for news articles that analyzes their titles and constructs multiple types of tokens including single words and n-grams of variable sizes. In the sequel, it employs several statistics, such as frequencies and token-class correlations, to assign two importance scores to each token. These scores reflect the ambiguity of a token; namely, how significant it is for the classification of an article to a category. The tokens and their scores are stored in a support structure that is subsequently used to classify the unlabeled articles. In addition, we propose a dimensionality reduction approach that reduces the size of the model without significant degradation of its classification performance. The algorithm is experimentally evaluated by employing a popular dataset of news articles and is found to outperform standard classification methods.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    14
    References
    0
    Citations
    NaN
    KQI
    []