Active Learning and Active Noise Correction for Document Classification

2015 
rs-technologies. de Abstract-This paper introduces two novel techniques that improve document classification while reducing the amount of manual work by the user. The first technique applies uncertainty sampling as a metric for batch-mode active learning to suggest only the most interesting documents for the manual labeling process, resulting in a steep improvement even for small training sets. This addresses the problem of creating and improving an initial training set. The second technique focuses on cleaning an existing large set of weakly labeled documents by active noise correction. The classifier's self-assessment is used to detect mislabeled documents which are then reclassified. For active noise correction, two approaches are explored: one based on a human expert and one that automatically corrects the assigned labels.
    • Correction
    • Cite
    • Save
    • Machine Reading By IdeaReader
    15
    References
    0
    Citations
    NaN
    KQI
    []