Effects of dimensionality reduction and feature selection in text classification

2011 
The goal of classifying text or generally data is to decrease the time of access to the information. Continuously increasing number of documents makes the classification process impossible to do manually. In this case, the automatic text classification systems are activated. In these systems, large data space is an important problem. By using dimensionality reduction techniques and feature selection in text classification systems, it is possible to do right classification with reduced size of data. In this study, Discrete Cosine Transform (DCT) method and the feature selection with Proportion of Variance method are proposed to get more effective results for classification results and short classification time is aimed. In experimental studies WebKB and R8 datasets in Reuters-21578 are used. By using DCT method classification success is highly preserved and with Proportion of Variance method classification success increase.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    26
    References
    1
    Citations
    NaN
    KQI
    []