A new two-stage hybrid feature selection algorithm and its application in Chinese medicine

2021 
High-dimensional small sample data are prone to the curse of dimensionality and overfitting and contain many irrelevant and redundant features. In order to solve these feature selection problems, a new Two-stage Hybrid Feature Selection Algorithm (Ts-HFSA) is proposed. The first stage uses the Filter method combined with the Wrapper method to adaptively remove irrelevant features. In the second stage, a De-redundancy Algorithm of Fusing Approximate Markov Blanket with L1 Regular Term (DA2MBL1) is used to solve the AMB’s problem of information loss when deleting redundant features and potential redundancy in the subset of features obtained by AMB. The experimental results on multiple UCI public data sets and datasets from the material foundation of Chinese medicine showed that the Ts-HFSA better deleted irrelevant features and redundant features, found smaller and higher quality feature subsets, and improved stability, indicating that it offers more advantages than AMB, FCBF, RF, GBDT, XGBoost, Lasso, and CI_AMB. Moreover, in the face of data of the material foundation of Chinese medicine, with higher feature dimensions and fewer sample sizes, Ts-HFSA performed better, which can also improve the precision of the model after greatly reducing the dimension. The results indicated that Ts-HFSA is an effective method for feature selection of high-dimensional small samples and an excellent research method for the material foundation of Chinese medicine.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    37
    References
    0
    Citations
    NaN
    KQI
    []