Performance of Heart Failure Clinical Prediction Models: A Systematic External Validation Study

Jenica N. Upshaw,Jason Nelson,Benjamin Koethe,Park Jg,Hannah L. McGinnes,Benjamin S. Wessler,Marvin A. Konstam,James E. Udelson,Van Calster B,van Klaveren D,Ewout W. Steyerberg,David M. Kent

Performance of Heart Failure Clinical Prediction Models: A Systematic External Validation Study

2021

Background: Most heart failure (HF) clinical prediction models (CPMs) have not been externally validated. Methods: We performed a systematic review to identify CPMs predicting outcomes in HF, stratified by acute and chronic HF CPMs. External validations were performed using individual patient data from 8 large HF trials (1 acute, 7 chronic). CPM discrimination (c-statistic, % relative change in c-statistic), calibration (calibration slope, Harrell9s E, E90), and net benefit were evaluated for each CPM with and without recalibration. Results: Of 135 HF CPMs screened, 24 (18%) were compatible with the population, predictors and outcomes to the trials and 42 external validations were performed (14 acute HF, 28 chronic HF). The median derivation c-statistic of acute HF CPMs was 0.76 (IQR, 0.75, 0.8), validation c-statistic was 0.67 (0.65, 0.68) and model-based c-statistic was 0.68 (0.66, 0.76), Hence, most of the apparent decrement in model performance was due to narrower case-mix in the validation cohort compared with the development cohort. The median derivation c-statistic for chronic HF CPMs was 0.76 (0.74, 0.8), validation c-statistic 0.61 (0.6, 0.63) and model-based c-statistic 0.68 (0.62, 0.71), suggesting that the decrement in model performance was only partially due to case-mix heterogeneity. Calibration was generally poor - median E (standardized by outcome rate) was 0.5 (0.4, 2.2) for acute HF CPMs and 0.5 (0.3, 0.7) for chronic HF CPMs. Updating the intercept alone led to a significant improvement in calibration in acute HF CPMs, but not in chronic HF CPMs. Net benefit analysis showed potential for harm in using CPMs when the decision threshold was not near the overall outcome rate but this improved with model recalibration. Conclusions: Only a small minority of published CPMs contained variables and outcomes that were compatible with the clinical trial datasets. For acute HF CPMs, discrimination is largely preserved after adjusting for case-mix; however, the risk of net harm is substantial without model recalibration for both acute and chronic HF CPMs.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations