Comparative analysis of EHR-based stroke phenotyping methods, their applications, and interpretation

2019 
Stroke is the second leading cause of death in the world and top cause of disability in the US. The plurality of electronic health records (EHR) provides an opportunity to study this disease in situ. Doing so requires accurately identifying stroke patients from medical records. So-called "EHR phenotyping" algorithms, however, are difficult and time-consuming to create and often must rely on incomplete information. There is an opportunity to use machine learning to speed up and ease the process of cohort and feature identification. We systematically compared and evaluated the ability of several machine learning algorithms to automatically phenotype acute ischemic stroke patients. We found that these algorithms can achieve high performance (e.g. average AUROC=0.955%) with little to no manual feature curation, and other performance evaluators differentiate each model9s ability to generalize. We also found that commonly available data such as diagnosis codes can be used as noisy proxies for training when a reference panel of stroke patients is unavailable. Additionally, we find some limitations when the algorithms are used to place patients into stroke risk classes. We used these models to identify unidentified stroke patients from our patient population of 6.4 million and find expected rates of stroke across the population.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    29
    References
    1
    Citations
    NaN
    KQI
    []