Stability and Accuracy Analysis of a Feature Selection Ensemble for Binary Classification in Biomedical Datasets

2018 
In the last decades, major advances have been made in the way we collect and generate data in all aspects of life. At the same time the technical approaches to analyze and interpret these datasets have improved. The intersection of these trends is called Big Data and plays an important role in precision medicine and other fields of computer science. Feature selection algorithms brought a considerable progress to cope with these vastly growing amount of machine readable informations. They can identify a minimal set of features that are relevant to develop prediction models with high accuracy. Thus, feature selection simplifies the interpretability as well as computability of big datasets. Many different feature selection methods already exist. Previous studies showed that some of them are biased depending on the feature type and dataset quality. In this work, an ensemble consisting of eight different feature selection methods (EFS) is introduced. An ensemble of learning algorithms has the advantage to be able to alleviate biases of individual approaches. Additionally, EFS provides a cumulative quantitative feature ranking. EFS was applied on several biomedical datasets. Different feature subset selections resulting from the EFS ranking were evaluated on three popular prediction models, namely logistic regression, random forest, and support vector machine. In most of the cases, a significant improvement of the prediction performance could be achieved compared to the same models built with all features. The EFS approach and the evaluations were implemented as an R package EFS as well as a web-application. A quantitative feature ranking and a cumulative barplot of the feature’s importance values are provided as output.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []