Instance selection based on boosting for instance-based learners

Aida de Haro-García,Gonzalo Cerruela García,Nicolás García Pedrajas

Instance selection based on boosting for instance-based learners

2019

Abstract Instance selection is one of the most important preprocessing steps in many machine learning tasks. Due to huge data size that is common in many current problems, removing redundant, useless, erroneous or noisy instances is a frequent initial step that is performed before other data mining algorithms are applied. Instance selection as part of this data reduction task is a relevant problem in current data mining research. Many instance selection methods hypothesize a certain way of characterizing the most important instances and then implement an algorithm to keep those important instances. The problem with these methods is that their success depends on the data fulfilling the underlying hypothesis. Other methods just add or delete instances considering only their effect on the accuracy of the nearest neighbor rule. In this paper, we present a new method of this second kind that uses boosting to obtain a subset of instances that is able to improve the classification accuracy of the whole dataset with a significant reduction. The instances are incrementally added by selecting those that maximize the accuracy of the subset using the weighting of instances from the construction of ensembles of classifiers and the step-wise addition of new instances. The method is compared using a large set of 205 different datasets with standard methods and shows the best overall performance.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations