Instance Selection by Border Sampling in Multi-class Domains

2009 
Instance selection is a pre-processing technique for machine learning and data mining. The main problem is that previous approaches still suffer from the difficulty to produce effective samples for training classifiers. In recent research, a new sampling technique, called Progressive Border Sampling (PBS), has been proposed to produce a small sample from the original labelled training set by identifying and augmenting border points. However, border sampling on multi-class domains is not a trivial issue. Training sets contain much redundancy and noise in practical applications. In this work, we discuss several issues related to PBS and show that PBS can be used to produce effective samples by removing redundancies and noise from training sets for training classifiers. We compare this new technique with previous instance selection techniques for learning classifiers, especially, for learning Naive Bayes-like classifiers, on multi-class domains except for one binary case which was for a practical application.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    25
    References
    4
    Citations
    NaN
    KQI
    []