Background: The 12-lead electrocardiogram (ECG) is a widely-used medical test with significant prognostic potential. We hypothesized that a deep neural network can predict an important future clini...
Predicting future clinical events helps physicians guide appropriate intervention. Machine learning has tremendous promise to assist physicians with predictions based on the discovery of complex patterns from historical data, such as large, longitudinal electronic health records (EHR). This study is a first attempt to demonstrate such capabilities using raw echocardiographic videos of the heart. We show that a large dataset of 723,754 clinically-acquired echocardiographic videos (~45 million images) linked to longitudinal follow-up data in 27,028 patients can be used to train a deep neural network to predict 1-year mortality with good accuracy (area under the curve (AUC) in an independent test set = 0.839). Prediction accuracy was further improved by adding EHR data (AUC = 0.858). Finally, we demonstrate that the trained neural network was more accurate in mortality prediction than two expert cardiologists. These results highlight the potential of neural networks to add new power to clinical predictions.
Predicting future clinical events helps physicians guide appropriate intervention. Machine learning has tremendous promise to assist physicians with predictions based on the discovery of complex patterns from historical data, such as large, longitudinal electronic health records (EHR). This study is a first attempt to demonstrate such capabilities using raw echocardiographic videos of the heart. We show that a large dataset of 723,754 clinically-acquired echocardiographic videos (~45 million images) linked to longitudinal follow-up data in 27,028 patients can be used to train a deep neural network to predict 1-year mortality with good accuracy (area under the curve (AUC) in an independent test set = 0.839). Prediction accuracy was further improved by adding EHR data (AUC = 0.858). Finally, we demonstrate that the trained neural network was more accurate in mortality prediction than two expert cardiologists. These results highlight the potential of neural networks to add new power to clinical predictions.
Introduction: Recent studies using deep learning techniques have demonstrated promising left ventricular ejection fraction (LVEF) assessment from transthoracic echocardiograms (TTEs). However, most prior studies have focused on videos from a single apical view, a technique known to be subject to limitations given the regionality of LV systolic function. We hypothesized that a deep learning model trained to include echocardiographic video clips from multiple views from a large dataset will improve accuracy in LVEF assessment. Methods: We identified all adult TTEs with a clinically reported LVEF at Columbia University between 2019-2024. A view classification model was trained to identify apical 4 and 2-chamber and parasternal long and short-axis views for LVEF assessment. The internal dataset was split into train, validation and test sets to train spatiotemporal convolutional models for each of the 4 views to assess LVEF for each video clip. The median clip-level LVEF within a study was used to derive a study-level LVEF. The model was evaluated on an internal test set and a large external test set, which included all available adult TTEs from Weill Cornell Medical Center since 2011. As benchmark comparison, the previously published EchoNet-Dynamic model was also evaluated on the external test set. Results: The model was trained and validated on 97,566 internal studies, comprising 1,424,265 videos from 60,741 unique patients. The model achieved state of the art performance on the internal test set (16,396 studies), with mean absolute error (MAE) of 3.4% and root mean squared error (RMSE) of 4.6%. Multi-view results were superior to all single-view models. Model showed robust predictions on external test set (179,298 studies), with MAE of 5.6% and RMSE of 7.1% and outperformed EchoNet-Dynamic (Table). Conclusions: We developed a deep learning model trained on multiple echocardiographic views using the largest dataset to date. Our model achieved state-of-the-art accuracy in assessing LVEF with a level of agreement between the AI and cardiologist LVEF assessments comparable to cardiologist interobserver variability. Further studies are underway to study the implementation of these models within clinical systems.
The electrocardiogram (ECG) is a widely-used medical test, typically consisting of 12 voltage versus time traces collected from surface recordings over the heart. Here we hypothesize that a deep neural network can predict an important future clinical event (one-year all-cause mortality) from ECG voltage-time traces. We show good performance for predicting one-year mortality with an average AUC of 0.85 from a model cross-validated on 1,775,926 12-lead resting ECGs, that were collected over a 34-year period in a large regional health system. Even within the large subset of ECGs interpreted as 'normal' by a physician (n=297,548), the model performance to predict one-year mortality remained high (AUC=0.84), and Cox Proportional Hazard model revealed a hazard ratio of 6.6 (p<0.005) for the two predicted groups (dead vs alive one year after ECG) over a 30-year follow-up period. A blinded survey of three cardiologists suggested that the patterns captured by the model were generally not visually apparent to cardiologists even after being shown 240 paired examples of labeled true positives (dead) and true negatives (alive). In summary, deep learning can add significant prognostic information to the interpretation of 12-lead resting ECGs, even in cases that are interpreted as 'normal' by physicians.
Abstract Use of machine learning for automated annotation of heart structures from echocardiographic videos is an active research area, but understanding of comparative, generalizable performance among models is lacking. This study aimed to 1) assess the generalizability of five state-of-the-art machine learning-based echocardiography segmentation models within a large clinical dataset, and 2) test the hypothesis that a quality control (QC) method based on segmentation uncertainty can further improve segmentation results. Five models were applied to 47,431 echocardiography studies that were independent from any training samples. Chamber volume and mass from model segmentations were compared to clinically-reported values. The median absolute errors (MAE) in left ventricular (LV) volumes and ejection fraction exhibited by all five models were comparable to reported inter-observer errors (IOE). MAE for left atrial volume and LV mass were similarly favorable to respective IOE for models trained for those tasks. A single model consistently exhibited the lowest MAE in all five clinically-reported measures. We leveraged the 10-fold cross-validation training scheme of this best-performing model to quantify segmentation uncertainty for potential application as QC. We observed that filtering segmentations with high uncertainty improved segmentation results, leading to decreased volume/mass estimation errors. The addition of contour-convexity filters further improved QC efficiency. In conclusion, five previously published echocardiography segmentation models generalized to a large, independent clinical dataset—segmenting one or multiple cardiac structures with overall accuracy comparable to manual analyses—with variable performance. Convexity-reinforced uncertainty QC efficiently improved segmentation performance and may further facilitate the translation of such models.
Background: Atrial fibrillation (AF) is associated with stroke, especially when AF goes undetected. Deep neural networks (DNN) can predict incident AF from a 12-lead resting ECG. We hypothesize that use of a DNN to predict new onset AF from an ECG may identify patients at risk of sustaining a potentially preventable AF-related stroke. Methods: We trained a DNN model to predict new-onset AF using 382,604 ECGs prior to 2010. We then evaluated the model performance on a test set of ECGs from 2010 through 2014 linked to patients in an institutional stroke registry. There were 181,969 patients in the test set with at least one ECG and no prior history of AF. Of those patients 3,497 (1.9%) had a stroke following an ECG that did not show AF. Within the set of patients with stroke, 375 had the stroke within 3 years of the ECG and were diagnosed with new AF between -3 and 365 days of the stroke. We considered these potentially preventable AF-related strokes. We report the sensitivity and positive predictive value (PPV) of the model for appropriately risk stratifying these 375 patients who sustained a potentially preventable AF-related stroke. Results: We used F β scores to identify different risk prediction thresholds (operating points) for the model. Operating points chosen by F 0.5 , F 1 , and F 2 scores identified 4, 12, and 21% of the population as high risk for the development of AF within 1 year (Figure 1). Screening 1, 4, 12, and 21% of the overall population resulted in PPV of 28, 21, 15, and 12%, respectively, for identification of new onset AF in one year. Using those same thresholds yielded sensitivities of 4, 17, 45, and 62% for identifying potentially preventable AF-related strokes. The different risk prediction thresholds resulted in a low (120-162) number needed to screen to detect one potentially preventable AF-related stroke at 3 years. Conclusions: Use of a deep learning model to predict new onset AF may identify patients at high risk of sustaining a potentially preventable AF-related stroke.
Background: Several large trials have employed age or clinical features to select patients for atrial fibrillation (AF) screening to reduce strokes. We hypothesized that a deep neural network (DNN) model risk prediction based on ECG would be superior to age and clinical variables at selecting a population at high risk for AF and AF-related stroke. Methods: We retrospectively included all patients with an ECG at Geisinger without a prior history of AF. Incidence of AF and AF-related strokes were identified as outcomes within 1 and 3 years after the ECG, respectively. AF-related stroke was defined as a stroke where AF was diagnosed at the time of stroke or within a year after the stroke. We selected a high-risk cohort for AF screening based on five risk stratification methods - criteria from four clinical trials (mSToPS, STROKESTOP, GUARD-AF and SCREEN-AF) and the DNN model at the qualifying ECG. We simulated patient selection and evaluated outcomes for twenty 1-year periods between 2010-2014 centered around the ECG encounter. For the clinical trials, the patients were considered eligible if they met the criteria before or within the period unless they satisfied exclusion criteria at the time of ECG. Results: The DNN model achieved optimal sensitivity (65%), PPV (10%), NNS for AF (10) within this population compared with all other risk models with a NNS for AF-related stroke of 160. Total screening number, sensitivity, positive predictive value (PPV) and number needed to screen (NNS) to capture AF and AF-related stroke are summarized in Table 1. The number of additional screens for the DNN model was slightly higher for two of the other models (SCREEN-AF and STROKESTOP) but lower than the other two (mSToPS and GUARD-AF). Conclusions: A DNN ECG-based risk prediction model is superior to contemporary AF-screening criteria based on age alone or age and clinical features in selecting a population for additional screening due to high risk for future AF and potential AF-related strokes.
A large number of cellular level abnormalities have been identified in the hippocampus of schizophrenic subjects. Nonetheless, it remains uncertain how these pathologies interact at a system level to create clinical symptoms, and this has hindered the development of more effective antipsychotic medications. Using a 72-processor supercomputer, we created a tissue level hippocampal simulation, featuring multicompartmental neuron models with multiple ion channel subtypes and synaptic channels with realistic temporal dynamics. As an index of the schizophrenic phenotype, we used the specific inability of the model to attune to 40 Hz (gamma band) stimulation, a well-characterized abnormality in schizophrenia. We examined several possible combinations of putatively schizophrenogenic cellular lesions by systematically varying model parameters representing NMDA channel function, dendritic spine density, and GABA system integrity, conducting 910 trials in total. Two discrete "clusters" of neuropathological changes were identified. The most robust was characterized by co-occurring modest reductions in NMDA system function (-30%) and dendritic spine density (-30%). Another set of lesions had greater NMDA hypofunction along with low level GABA system dysregulation. To the schizophrenic model, we applied the effects of 1,500 virtual medications, which were implemented by varying five model parameters, independently, in a graded manner; the effects of known drugs were also applied. The simulation accurately distinguished agents that are known to lack clinical efficacy, and identified novel mechanisms (e.g., decrease in AMPA conductance decay time constant, increase in projection strength of calretinin-positive interneurons) and combinations of mechanisms that could re-equilibrate model behavior. These findings shed light on the mechanistic links between schizophrenic neuropathology and the gamma band oscillatory abnormalities observed in the illness. As such, they generate specific falsifiable hypotheses, which can guide postmortem and other laboratory research. Significantly, this work also suggests specific non-obvious targets for potential pharmacologic agents.