ABSTRACT Severity scores assess the acuity of critical illness by penalizing for the deviation of physiologic measurements from normal and aggregating these penalties (also called “weights” or “subscores”) into a final score (or probability) for quantifying the severity of critical illness (or the likelihood of in-hospital mortality). Although these simple additive models are human readable and interpretable, their predictive performance needs to be further improved. To address this need, we argue for replacing these simple additive models with models based on state-of-the-art non-linear supervised learning algorithms (e.g., Random Forest (RF) and eXtreme Gradient Boosting (XGB)). Specifically, we present OASIS+, a variant of the Oxford Acute Severity of Illness Score (OASIS) in which an ensemble of 200 decision trees is used to predict in-hospital mortality based on the 10 same clinical variables in OASIS. Using a test set of 9566 admissions extracted from MIMIC-III database, we show that the performance of OASIS can be substantially improved from AUC score of 0.77 to 0.83 using OASIS+. Moreover, we show that OASIS+ has superior performance compared to eight other commonly used severity scoring methods. Our results underscore the potential of improving existing severity scores by using more sophisticated machine learning algorithms (e.g., ensemble of non-linear decision tress) not just via including additional physiologic measurements.
Background: The 12-lead electrocardiogram (ECG) is a widely-used medical test with significant prognostic potential. We hypothesized that a deep neural network can predict an important future clini...
Introduction: A deep learning ECG algorithm, rECHOmmend, can accurately identify patients with any of seven structural heart diseases: five valvular diseases, low ejection fraction and interventricular septal (IVS) thickening. Components of the rECHOmmend composite label (IVS>15mm, mitral regurgitation) are also associated with hypertrophic cardiomyopathy (HCM). We hypothesized that despite being trained without HCM-specific labels, rECHOmmend can reliably identify HCM patients and achieve comparable performance to an HCM-specific classifier. Methods: Algorithms were developed from 2,898,979 ECGs acquired from 661,366 patients between 1984-2021. rECHOmmend was trained on a composite label derived from echocardiography and electronic health record (EHR) data. This ensemble model consists of 7 disease specific models with an aggregate model to predict a composite structural heart disease endpoint with shared clinical actionability. Separately, an HCM-specific model was trained on a binary label derived from EHR. To enable comparison, both classifiers were tested on a shared ECG holdout set (ECG prevalence 1.24%, patient prevalence 0.52%). Results: Despite being trained without HCM specific labels, the rECHOmmend ensemble showed comparable performance to a HCM-specific classifier (C-statistic: 0.92 [0.90-0.93] vs 0.90 [0.89-0.91]). At an operating point optimized for the F1-score, the sensitivity to HCM was higher for rECHOmmend at 0.42 [0.33-0.50] compared to 0.18 [0.15-0.21] for the HCM-specific classifier. rECHOmmend sustained performance across a range of IVS thicknesses, suggesting it was not solely reliant on IVS thickening for HCM identification and other ensemble components contributed to performance. Conclusions: A composite deep learning algorithm trained to identify structural heart diseases can identify clinically ascertained HCM with good performance, despite being trained without HCM-specific labels.
Abstract Background The Mayo endoscopic subscore (MES) is widely used to assess endoscopic disease severity assigned by human readers in Ulcerative Colitis (UC) clinical trials. AI-based automation of the MES could reduce inter-rater variability and allow for the development of more sensitive endoscopic measures. This report assesses whether a previously trained (locked) algorithm is suitable for automating full colon or segment-level MES scoring on a prospective UC clinical trial. Methods Endoscopy videos from two UC clinical trials (UNIFI: NCT02407236, Phase 3, 965 subjects, 3128 videos; and JAK-UC: NCT01959282, Phase 2, 211 subjects, 448 videos) were used to train an AI-based MES classifier, where 20% of the total data was retained as a holdout set. This AI model training had two steps: 1) training a feature extraction module using self-supervised learning (SSL), and 2) supervised training of a small transformer network with an attention-based classifier using SSL features to estimate full colon MES. Videos from an independent, prospective UC trial (QUASAR: NCT04033445, Phase 2b induction study, 313 subjects, 615 videos) were used to validate the locked AI model. MES scoring in QUASAR included full colon MES values and additional MES values for three left colon segments: descending colon, sigmoid colon, and rectum. Comparisons between AI-model and human reader scores were performed using AUC, Accuracy, F1 score, and Fleiss kappa. A non-inferiority test was also conducted to determine interchangeability between AI- and human-derived full colon MES values. Results Full colon MES on the QUASAR data showed AUC, Accuracy, and F1 scores of 0.810, 0.687, and 0.693, respectively, comparable to results obtained on the UNIFI holdout data (0.803, 0.645, and 0.647). The Fleiss kappa score was 0.682, comparable to the inter-rater agreement between two human readers-local and central readers (Fleiss kappa = 0.712). The non-inferiority test (p-value < 0.05) indicated that the AI-computed full colon MES readout was interchangeable to that of human readers. Similar performance was observed for the AI-computed segment-level MES: descending colon, sigmoid colon, and rectum as shown in Table 1. This result demonstrates the model's effectiveness at scoring the segment-level MES despite not being trained with segment level ground truth. Conclusion ArgesMES, an AI-based model, underwent successful prospective validation, demonstrating proficiency in automating full colon and segment-level MES scores. ArgesMES has the potential to facilitate rapid, reliable, and reproducible MES scoring at full colon and segment levels in prospective clinical trials.
Background: Several large trials have employed age or clinical features to select patients for atrial fibrillation (AF) screening to reduce strokes. We hypothesized that a deep neural network (DNN) model risk prediction based on ECG would be superior to age and clinical variables at selecting a population at high risk for AF and AF-related stroke. Methods: We retrospectively included all patients with an ECG at Geisinger without a prior history of AF. Incidence of AF and AF-related strokes were identified as outcomes within 1 and 3 years after the ECG, respectively. AF-related stroke was defined as a stroke where AF was diagnosed at the time of stroke or within a year after the stroke. We selected a high-risk cohort for AF screening based on five risk stratification methods - criteria from four clinical trials (mSToPS, STROKESTOP, GUARD-AF and SCREEN-AF) and the DNN model at the qualifying ECG. We simulated patient selection and evaluated outcomes for twenty 1-year periods between 2010-2014 centered around the ECG encounter. For the clinical trials, the patients were considered eligible if they met the criteria before or within the period unless they satisfied exclusion criteria at the time of ECG. Results: The DNN model achieved optimal sensitivity (65%), PPV (10%), NNS for AF (10) within this population compared with all other risk models with a NNS for AF-related stroke of 160. Total screening number, sensitivity, positive predictive value (PPV) and number needed to screen (NNS) to capture AF and AF-related stroke are summarized in Table 1. The number of additional screens for the DNN model was slightly higher for two of the other models (SCREEN-AF and STROKESTOP) but lower than the other two (mSToPS and GUARD-AF). Conclusions: A DNN ECG-based risk prediction model is superior to contemporary AF-screening criteria based on age alone or age and clinical features in selecting a population for additional screening due to high risk for future AF and potential AF-related strokes.
Introduction: Heart failure is a prevalent, costly disease for which new value-based payment models demand optimized population management strategies. We aimed to generate a novel strategy for mana...