Ensembles of Feature Selectors for dealing with Class-Imbalanced Datasets: A proposal and comparative study

2020 
Abstract Feature selection is an important task in many machine learning and data mining problems. Due to the increasing size of datasets, the removal of redundant, erroneous or noisy features is frequently an initial step. In many common applications, datasets have imbalanced class distribution. Although feature selection suffers in the presence of an uneven distribution of samples among the classes, there are few methods specifically designed for this situation. We propose a new approach based on the use of feature selection boosting to address this problem. Ensembles of feature selectors using both, standard and specifically designed for class-imbalanced datasets methods, are constructed using boosting algorithms,. These ensembles are used to perform feature selection in class-imbalanced datasets. The combination of different rounds of feature selectors over different samples with an adaptive distribution of the instances achieves an improved performance when compared with standard feature selection methods in class-imbalanced datasets. A comprehensive set of experiments that employ 18 different ensemble methods, 7 different feature selection methods and 140 class-imbalanced datasets demonstrates the efficiency of the ensemble approach in terms of reduction ability and classification performance. Further study of the proposed method shows its robustness in the presence of class label noise.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    36
    References
    4
    Citations
    NaN
    KQI
    []