Development of phenotyping algorithms for the identification of organ transplant recipients: Cohort Study (Preprint)

2020 
BACKGROUND Studies involving organ transplant recipients (OTR) are often limited to the variables collected in the national Scientific Registry of Transplant Recipients database. The electronic health record (EHR) contains additional variables that can augment this data source if OTR can be identified accurately. OBJECTIVE Develop phenotyping algorithms to identify OTR from the EHR. METHODS We used Vanderbilt's de-identified version of its EHR database that contains nearly 3 million subjects to develop algorithms to identify organ transplant recipients. We identified all 19,817 individuals with at least one ICD or CPT code for organ transplantation. We performed chart review on 1,350 randomly-selected individuals to determine transplant status. We constructed machine learning models to calculate positive predictive values and sensitivity for combinations of codes using the classification and regression trees, random forest, and extreme gradient boosting algorithms. RESULTS Of the 1,350 reviewed patient charts, 827 were transplant recipients, while 511 had no record of a transplant, and 12 were equivocal. Most patients with only one or two transplant codes did not have a transplant. The most common reasons for being labeled a non-transplant patient were a lack of data (n = 229, 44.8%), or the patient being evaluated for an organ transplant (n = 174, 34.1%). All three machine learning algorithms identified organ transplant recipients with overall >90% PPV and >88% sensitivity. CONCLUSIONS Electronic health records (EHR) linked to biobanks are increasingly used to conduct large-scale studies, but have not been well-utilized in organ transplantation research. We present rigorously-evaluated methods for phenotyping OTR from the EHR that will enable the use of the full spectrum of clinical data in transplant research. Using several different machine learning algorithms, we were able to identify transplant cases with high accuracy using only ICD and CPT codes. CLINICALTRIAL
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    20
    References
    1
    Citations
    NaN
    KQI
    []