An RF-BFE algorithm for feature selection in radiomics analysis

2019 
Radiomics analysis has been shown to have considerable potential power for treatment assessment, cancer genetics analysis and clinical decision support. A broad set of quantitative features extracted from medical images is expected to build a descriptive and predictive model, which relating the image features to phenotypes or gene-protein signatures. As a common wrapper strategy, Backward Feature Elimination (BFE) algorithm is widely used to reduce the dimensionality of feature space. In this paper, we propose an effective BFE algorithm utilizing Random Forest (RF) to automatically select the optimal feature subset and try to predict the EGFR mutations using CT images. Firstly, the whole dataset was shuffled and the features were ranked by RF importance measures. Then, LASSO regression was iteratively used to perform both regularization and accuracy calculation in the BFE, ending when any further removals do not result in an improvement, to gain a series of feature subsets. Lastly, we gathered all the feature subsets in a feature counter and final feature subset was determined by hard voting with equal weight. The dataset consists of 130 CT image series with EGFR-mutated lung adenocarcinoma harboring Ex19 (n=56) and Ex21 (n=74) and more than 2000 radiomic features were extracted in each series. Seven features were selected as the set to predict EGFR mutation and all of the features were from Wavelet and Gabor filtered image. It reached best classification result (AUC 0.74, 95% CI, 0.67-0.84) on the K-nearest neighbors (KNN) model.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    6
    References
    1
    Citations
    NaN
    KQI
    []