Assessing the Value of Unsupervised Clustering in Detecting Key Classes of Diagnostic and Medication Codes to Improve the Prediction of Persistent High Healthcare Utilizers.

2021 
BACKGROUND A high proportion of healthcare services are persistently utilized by a small subpopulation of patients. To improve clinical outcomes while reducing cost and utilization, population health management programs often provide targeted interventions to patients who may become persistent high users/utilizers (PHUs). Enhanced prediction and management of PHUs can improve healthcare system efficiencies and improve the overall quality of patient care. OBJECTIVE To detect key classes of diseases and medications among the study population; and, to assess the predictive value of these classes in identifying PHUs. METHODS This study is a retrospective analysis of insurance claims data of patients from the Johns Hopkins Health Care system. We defined a PHU as a patient incurring healthcare costs in the top 20% of all patients' costs for four consecutive 6-month periods. We used 2013 claims data to predict PHU status in 2014-2015. We applied Latent Class Analysis (LCA), an unsupervised clustering approach, to identify patient subgroups with similar diagnostic and medication patterns to differentiate variations in healthcare utilization across PHUs. Logistic regression models were then built to predict PHUs in the full population and in select subpopulations. Predictors included LCA membership probabilities, demographic, and health utilization covariates. Predictive powers of regression models were assessed and compared using standard metrics. RESULTS We identified 164,221 patients with continuous enrollment between 2013 and 2015. The mean study population age was 19.7 years, 55.9% were female, 3.3% had ≥1 hospitalization, and 19.1% had 10+ outpatient visits in 2013. A total of 8359 (5.1%) patients were identified as PHUs in both 2014 and 2015. The LCA performed optimally when assigning patients to four probability disease/medication classes. Given the feedback provided by clinical experts, we further divided the population into four diagnostic groups for sensitivity analysis: Acute Upper Respiratory Infection (URI) (n=53,232; 4.6% PHUs), Mental Health (n = 34,456; 12.8% PHUs), Otitis Media (n=24,992; 4.5% PHUs), and Musculoskeletal (n=24,799; 15.5% PHUs). For the regression models predicting PHUs in the full population, the F1-score classification metric was lower using a parsimonious model which included LCA categories (F1=38.62%) compared to a complex risk stratification model with a full set of predictors (F1=48.20%). However, the LCA-enabled simple models were comparable to the complex model when predicting PHUs in the Mental Health and Musculoskeletal subpopulations (F1-scores of 48.69% and 48.15%, respectively). F1-scores were lower than the complex model when the LCA-enabled models were limited to Otitis Media and Acute URI subpopulations (45.77% and 43.05%). CONCLUSIONS Our study illustrates the value of LCA in identifying subgroups of patients with similar patterns of diagnoses and medications. Our results show that LCA-derived classes can simplify predictive models of PHUs without compromising predictive accuracy. Future studies should investigate the value of LCA-derived classes for predicting PHUs in other healthcare settings. CLINICALTRIAL
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    33
    References
    0
    Citations
    NaN
    KQI
    []