ABSTRACT Severity scores assess the acuity of critical illness by penalizing for the deviation of physiologic measurements from normal and aggregating these penalties (also called “weights” or “subscores”) into a final score (or probability) for quantifying the severity of critical illness (or the likelihood of in-hospital mortality). Although these simple additive models are human readable and interpretable, their predictive performance needs to be further improved. To address this need, we argue for replacing these simple additive models with models based on state-of-the-art non-linear supervised learning algorithms (e.g., Random Forest (RF) and eXtreme Gradient Boosting (XGB)). Specifically, we present OASIS+, a variant of the Oxford Acute Severity of Illness Score (OASIS) in which an ensemble of 200 decision trees is used to predict in-hospital mortality based on the 10 same clinical variables in OASIS. Using a test set of 9566 admissions extracted from MIMIC-III database, we show that the performance of OASIS can be substantially improved from AUC score of 0.77 to 0.83 using OASIS+. Moreover, we show that OASIS+ has superior performance compared to eight other commonly used severity scoring methods. Our results underscore the potential of improving existing severity scores by using more sophisticated machine learning algorithms (e.g., ensemble of non-linear decision tress) not just via including additional physiologic measurements.
Performance incentives for preventive care may encourage inappropriate testing, such as cancer screening for patients with short life expectancies. Defining screening colonoscopies for patients with a >50% 4-year mortality risk as inappropriate, the authors performed a pre-post analysis assessing the effect of introducing a cancer screening incentive on the proportion of screening colonoscopy orders that were inappropriate. Among 2078 orders placed by 23 attending physicians in 4 academic general internal medicine practices, only 0.6% (n = 6/1057) of screening colonoscopy orders in the preintervention period and 0.6% (n = 6/1021) of screening colonoscopy orders in the postintervention period were deemed “inappropriate.” This study found no evidence that the incentive led to an increase in inappropriate screening colonoscopy orders.
Females undergoing coronary artery bypass grafting (CABG) surgery have been reported to be at increased risk of postoperative mortality and comorbidity. Our main objective is to evaluate the impact of female sex on 30-day mortality after isolated CABG surgery. We created a retrospective cohort of adult patients underwent isolated CABG surgery between 2006 and 2020 in a large rural healthcare system. Patients were grouped by sex and a 1:1 nearest neighbor propensity score matching method was performed to reduce the bias due to potential confounding. Association between female sex and 30-day mortality was assessed using conditional regression analysis and appropriate statistical tests for matched analyses. Associations between female sex and eight secondary outcomes were also considered. Out of 5616 adult patients underwent isolated CABG surgery, 1352 were females. The propensity scoring matching method provided 1346 matched pairs with no observed significant imbalance for any of the included confounders. The conditional logistic regression analysis showed independent association between female sex and 30-day mortality (OR = 1.83, CI = 1.10-3.04, p = 0.02). Females undergoing isolated CABG surgery were at significantly greater risk of postoperative 30-day mortality and longer postoperative length of stay. Further research is needed to identify and address the causes of these disparities.
Introduction: Initiation of QTc-prolonging medications may lead to the rare but potentially catastrophic event, torsades de pointes (TDP). At present, no adequate, generalizable tools exist to predict drug-induced long QTc (LQT); machine learning from ECG data is a promising approach. Hypothesis: Prediction of drug-induced LQT using an ECG-based machine learning model is feasible, and outperforms a model trained on baseline QTc, age, and sex alone. Methods: We identified baseline 12-lead ECGs with QTc values < 500 ms for patients who had not received any known, conditional, or possible QTc prolonging medication per CredibleMeds at the time of ECG or within the past 90 days. We matched these with ECGs from the same patients while they were taking at least one CredibleMeds drug (“on-drug” ECGs). Using 5-fold cross-validation, we trained and tested two machine learning models using the baseline ECGs of the 92,848 resulting pairs to predict drug-induced LQT (≥500 ms) in the on-drug ECGs: a deep neural network using ECG voltage data, and a gradient-boosted tree using the baseline QTc. Age and sex were also inputs to both models. Results: On-drug LQT prevalence was 16%. The ECG model demonstrated superior performance in predicting on-drug LQT (area under the receiver operating characteristic curve (AUC) = 0.756) compared to the QTc model (0.710). At a potential operating point (Figure), the ECG model had 89% sensitivity and 95% negative predictive value. Even in the subset of patients with baseline QTc < 470/480 ms (male/female; post-drug LQT prevalence = 14%), the ECG model demonstrated good performance (AUC = 0.736). Conclusions: An ECG-based machine learning model can stratify patients by risk of developing drug-induced LQT better than a model using baseline QTc alone. This model may have clinical value to identify high-risk drug starts that would benefit from closer monitoring and others who are at low risk of drug-induced LQT.
Background: Several large trials have employed age or clinical features to select patients for atrial fibrillation (AF) screening to reduce strokes. We hypothesized that a deep neural network (DNN) model risk prediction based on ECG would be superior to age and clinical variables at selecting a population at high risk for AF and AF-related stroke. Methods: We retrospectively included all patients with an ECG at Geisinger without a prior history of AF. Incidence of AF and AF-related strokes were identified as outcomes within 1 and 3 years after the ECG, respectively. AF-related stroke was defined as a stroke where AF was diagnosed at the time of stroke or within a year after the stroke. We selected a high-risk cohort for AF screening based on five risk stratification methods - criteria from four clinical trials (mSToPS, STROKESTOP, GUARD-AF and SCREEN-AF) and the DNN model at the qualifying ECG. We simulated patient selection and evaluated outcomes for twenty 1-year periods between 2010-2014 centered around the ECG encounter. For the clinical trials, the patients were considered eligible if they met the criteria before or within the period unless they satisfied exclusion criteria at the time of ECG. Results: The DNN model achieved optimal sensitivity (65%), PPV (10%), NNS for AF (10) within this population compared with all other risk models with a NNS for AF-related stroke of 160. Total screening number, sensitivity, positive predictive value (PPV) and number needed to screen (NNS) to capture AF and AF-related stroke are summarized in Table 1. The number of additional screens for the DNN model was slightly higher for two of the other models (SCREEN-AF and STROKESTOP) but lower than the other two (mSToPS and GUARD-AF). Conclusions: A DNN ECG-based risk prediction model is superior to contemporary AF-screening criteria based on age alone or age and clinical features in selecting a population for additional screening due to high risk for future AF and potential AF-related strokes.
Background: We aim to investigate the use of intra-operative heart rate (HR) trajectories to develop and characterize sub-phenotypes of coronary artery bypass grafting (CABG) patients with distinct risk and outcome profiles. Methods: We used a retrospective cohort including 4194 CABG patients admitted to a large rural healthcare system between 2012 and 2021. Functional data analysis (FDA) was applied to patients’ intra-operative HR trajectories to identify distinct sub-phenotypes. Results: The elbow method suggested that the optimal number of clusters is four. Fig. 1 shows the mean HR trajectory curve (top) and Kaplan-Meier survival curve (bottom) for each of the four identified groups. G1 (Low HR) includes 35.2% of the patients with median age 68 years. Patients in G1 had the lowest ICU after CABG length of stay (LoS) of 76.1 hours and the longest CABG median procedure duration of 4.6 hours. Assignment to G1 was significantly associated with shorter ICU LoS (p= 2.2E-6). G2 (Intermediate HR) includes 35.4% of the patients with median age of 67 years. Patients in G2 had the highest prevalence of urgent admissions, 25.5%. Assignment to G2 was significantly associated with shorter ICU LoS (p=0.016). G3 (Increasing HR) includes 15.6% of the patients with median age 67. Patients in G3 had the shortest median procedure duration of 3.68 hours. Those patients were found to be significantly at risk of prolonged ICU stays (p=0.0002). Finally, G4 (High HR) includes 16.2% with the lowest observed median age of 65 years. Patients in G4 have the highest prevalence of in-hospital mortality (5.2%), 30-day readmission (4.3%), and 1-year mortality (4.8%). Assignment to G4 was significantly associated with the following outcomes: prolonged ICU stays (p=2.62e-9); in-hospital mortality (p=8.03e-5); 30-day readmission (p=0.017); and 1-year mortality (p=0.02). Conclusions: CABG sub-phenotypes based on intra-operative HR trajectories had distinct clinical characteristics and outcomes.
Several large trials have employed age or clinical features to select patients for atrial fibrillation (AF) screening to reduce strokes. We hypothesized that a machine learning (ML) model trained to predict AF risk from 12‑lead electrocardiogram (ECG) would be more efficient than criteria based on clinical variables in indicating a population for AF screening to potentially prevent AF-related stroke.We retrospectively included all patients with clinical encounters in Geisinger without a prior history of AF. Incidence of AF within 1 year and AF-related strokes within 3 years of the encounter were identified. AF-related stroke was defined as a stroke where AF was diagnosed at the time of stroke or within a year after the stroke. The efficiency of five methods was evaluated for selecting a cohort for AF screening. The methods were selected from four clinical trials (mSToPS, GUARD-AF, SCREEN-AF and STROKESTOP) and the ECG-based ML model. We simulated patient selection for the five methods between the years 2011 and 2014 and evaluated outcomes for 1 year intervals between 2012 and 2015, resulting in a total of twenty 1-year periods. Patients were considered eligible if they met the criteria before the start of the given 1-year period or within that period. The primary outcomes were numbers needed to screen (NNS) for AF and AF-associated stroke.The clinical trial models indicated large proportions of the population with a prior ECG for AF screening (up to 31%), coinciding with NNS ranging from 14 to 18 for AF and 249-359 for AF-associated stroke. At comparable sensitivity, the ECG ML model indicated a modest number of patients for screening (14%) and had the highest efficiency in NNS for AF (7.3; up to 60% reduction) and AF-associated stroke (223; up to 38% reduction).An ECG-based ML risk prediction model is more efficient than contemporary AF-screening criteria based on age alone or age and clinical features at indicating a population for AF screening to potentially prevent AF-related strokes.