A new feature weighting method based on probability distribution in imbalanced text classification

2010 
Many real-world text classification tasks involve imbalanced training examples. Categories with fewer examples are under-represented and their classifiers often perform far below satisfactory. We propose a new approach using a probability distribution to assign the feature weight and apply it to Naive Bayes classifier. The method is evaluated in our experiments on FuDan Chinese Corpus. The experimental result shows significant improvement for imbalanced datasets while the performance for balanced datasets is not jeopardized. Our approach has suggested a simple and effective solution to boost the performance of text classification over skewed datasets.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    11
    References
    8
    Citations
    NaN
    KQI
    []