Abstract 16678: Classification of Cardiovascular Proteins Involved in Coronary Atherosclerosis and Heart Failure Using Watson’s Cognitive Computing Technology

2017 
Introduction: Big data has the potential to deliver on the promise of personalized medicine by using sophisticated machine learning and other advanced analytic techniques to extract important insights regarding the complex interactions between biology and disease. We evaluated the ability of IBM Watson’s cognitive computing platform to correctly classify an expertly curated list of proteins known to be involved in specific cardiovascular disease phenotypes. Methods: A list of 1,274 biologically relevant protein analytes from a commercial proteomic array (SOMAscan) that span a diverse set of molecular functions and diseases were used to generate training (2/3) and validation datasets (1/3) of proteins involved in the development of coronary atherosclerosis (n=45) and heart failure (n=81). Watson generated a predictive model by creating a text “fingerprint” for each training protein by analyzing the linguistic context in which the proteins are found in all MEDLINE abstracts and then produced a ranked list of proteins that were semantically similar to the training proteins using a binary classification method. Results: Watson ranked the proteins in the validation set involved in coronary atherosclerosis (median rank: 25 vs. 626, p =5x10 -6 ) and heart failure (rank: 137 vs. 625, p =1x10 -11 ) significantly higher than the other candidate proteins, with c-statistics of 0.88 for both [Figure]. Conclusions: These data demonstrate that Watson’s cognitive computing platform can correctly classify proteins involved in specific cardiovascular disease phenotypes using machine learning algorithms based on an annotated scientific corpus.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []