A topological AUC-based biomarker ensemble method for the complex disease analysis

2020 
Complex diseases are affected by many factors, and their pathogenic mechanism is complicated, which brings difficulties to the analysis and treatment of diseases. AUC, the area under the ROC curve, is often used as a gold standard to evaluate the performance of a binary classifier. The existing methods of constructing classifier by optimizing AUC are easy to fall into local optimum, and have high time complexity, which is not suitable for real-time analysis of high-dimensional gene expression data. With the rapid development of high-throughput sequencing technology, feature selection and model estimation become the necessary means to reduce the dimension and complexity of data, and the selected important features have the potential as biomarkers to reveal the pathogenesis of diseases. In this paper, we proposed a topological AUC-based biomarker ensemble method for the complex disease analysis, which uses gene expression data and the topological information derived from the protein-protein interaction network to identify biomarkers. The main contribution is to optimize two objectives simultaneously: maximizing the AUC score and minimizing the number of selected features. We applied the proposed method to analyze two types of problems: 1) prognosis of breast cancer, 2) classification of similar diseases. The results show that our method can effectively identify a small set of biomarkers with the powerful classification ability and the biological interpretability.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    19
    References
    1
    Citations
    NaN
    KQI
    []