Abstract 858: PrismML: A machine learning platform to query genotype-phenotype patterns in large genomics studies

2020 
Background: Large-scale genomics studies (e.g. AACR Project GENIE, TCGA, TopMed) have sequenced thousands of patients in an attempt to understand disease associated genomic variables and their clinical correlates. Existing online platforms (e.g. cBioPortal) enable simple gene-based queries, but do not allow more complex modeling to understand disease pathogenesis, risk and outcome. There is an urgent need to build an interactive, modular and scalable platform that enables users to perform multivariate machine learning on existing genomic data. Results: We have built a platform, PrismML, that enables a user to interactively query a dataset, and to run a multitude of machine learning tools, from simple statistical tests for differential analysis to multivariate modeling to predict clinical response, or mortality-risk. Since machine learning models are computationally intensive, we have used the power of cloud computing to make the analyses faster and scalable. Key feature of our platform are: (1) availability of extensive statistical and machine learning methods; (2) implementation of best practices for machine learning, e.g. cross-validation; (3) graphical querying of results to understand the interplay among features. Users can choose to analyze existing data/studies, or upload their own data. Examples of possible queries: “identify genomic features that distinguish metastasis from primary tumors, either in a single cancer or pan-cancer”, or, “build a machine learning model to predict survival within ER- breast cancer patients”. In addition, there is also an acute need to integrate the knowledge extracted from the multitude of data types. To this end, we have integrated multiple data types into gene-scores, and have incorporated known biological/functional information by integrating gene-scores into pathway-scores. Summary: PrismML is an interactive and flexible platform to bring the power of machine learning and statistical modeling to the genomics community. This is an active area of development with multiple ongoing features, such as integrating multiple datasets to increase statistical power in rare diseases, and to enable subsetting large diseases to identify prognostic features. Citation Format: Anupama Reddy, Daisy Flemming, Sara Selitsky, Ana Brandusa Pavel, Gabriela Alexe, Gyan Bhanot. PrismML: A machine learning platform to query genotype-phenotype patterns in large genomics studies [abstract]. In: Proceedings of the Annual Meeting of the American Association for Cancer Research 2020; 2020 Apr 27-28 and Jun 22-24. Philadelphia (PA): AACR; Cancer Res 2020;80(16 Suppl):Abstract nr 858.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []