Improving performance of Text Categorization: Using Multi Feature Coselection Clustering Technique and Lsquare Machine Learning

2010 
Text categorization is continuing to be one of the most researched NLP problems due to the ever-increasing amounts of electronic documents and digital libraries. In this paper, we present a novel text categorization method that combines the Multitype Features Coselection for Clustering and a learning logic technique, called Lsquare, for constructing text classifiers. The high dimensionality of text in a document has not been fruitful for the task of categorization, for which reason, feature clustering has been proven to be an ideal alternative to feature selection for reducing the dimensionality. We, therefore, use Multitype Features Coselection for Clustering (MFCC) to generate an efficient representation of documents and apply Lsquare for training text classifiers. The method was extensively tested and evaluated. The proposed method achieves higher or comparable classification accuracy and F1 results compared with SVM. MFCC improves clustering performance.
    • Correction
    • Cite
    • Save
    • Machine Reading By IdeaReader
    20
    References
    1
    Citations
    NaN
    KQI
    []