Just Add Data: Automated Predictive Modeling and BioSignature Discovery

2020 
Fully automated machine learning, statistical modelling, and artificial intelligence for predictive modeling is becoming a reality, giving rise to the field of Automated Machine Learning (Au-toML). AutoML systems promise to democratize data analysis to non-experts, drastically in-crease productivity, improve replicability of the statistical analysis, facilitate the interpretation of results, and shield against common methodological analysis pitfalls. We present the basic ideas and principles of Just Add Data Bio (JADBIO), an AutoML technology applicable to the low-sample, high-dimensional omics data that arise in translational medicine and bioinformatics appli-cations. In addition to predictive and diagnostic models ready for clinical use, JADBIO also re-turns the corresponding biosignatures, i.e., minimal-size subsets of biomarkers that are jointly predictive of the outcome of interest. A use-case on thymic epithelial tumors is presented, along with an extensive evaluation on 374 public biological datasets. Results show that long-standing challenges with overfitting and overestimation of complex non-linear machine learning pipelines on high-dimensional, low small sample data can be overcome.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    59
    References
    13
    Citations
    NaN
    KQI
    []