Feature Selection Based on Shapley Additive Explanations on Metagenomic Data for Colorectal Cancer Diagnosis

2021 
Personalized medicine is one of the hottest current approaches to take care of and improve human health. Scientists who participate in projects related to personalized medicine approaches usually consider metagenomic data as a valuable data source for developing and proposing methods for disease treatments. We usually face challenges for processing metagenomic data because of its high dimensionality and complexities. Numerous studies have attempted to find biomarkers which can be medical signs related significantly to the diseases. In this study, we propose an approach based on Shapley Additive Explanations, a model explainability, to select valuable features from metagenomic data to improve the disease prediction tasks. The proposed feature selection method is evaluated on more than 500 samples of colorectal cancer coming from various geographic regions such as France, China, the United States, Austria, and Germany. The set of 10 selected features based on Shapley Additive Explanations can achieve significant results compared to the feature selection method based on the Pearson coefficient and it also obtains comparative performances compared to the original set of features including approximately 2000 features.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    24
    References
    0
    Citations
    NaN
    KQI
    []