An ensemble feature selection method for high-dimensional data based on sort aggregation

2019 
With the rapid development of the Internet, big data has been applied in a large amount of application. However, there are often redundant or irrelevant features in high dimensional data, so feature selection is particularly important. Because the feature subset obtained by a single feature selection method may be biased, an ensemble feature selection method named SA-EFS based on sort aggregation is proposed in this paper, and this method is oriented to classification tasks. For high-dimensional data sets, the results of three feature selection methods, chi-square test, maximum information coefficient and XGBoost, are aggregated by specific strategy. The integration effects of arithmetic mean and geometric mean aggregation strategy on this model are analyzed. In order to evaluate the classification and prediction performance of feature subset, three classifiers with excellent performance, KNN, Random Forest and XGBoost, are tested respectively, and the influence of threshold on classification performance ...
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    8
    References
    21
    Citations
    NaN
    KQI
    []