Predicting COVID-19 infection risk and related risk drivers in nursing homes: A machine learning approach

2020 
ABSTRACT Objective Inform COVID-19 infection prevention measures by identifying and assessing risk and possible vectors of infection in nursing homes (NHs) using a machine-learning approach. Design This retrospective cohort study utilized a gradient boosting algorithm to evaluate risk of COVID-19 infection (i.e., presence of at least one confirmed COVID-19 resident) in NHs. Setting and participants: The model was trained on outcomes from 1,146 NHs in Massachusetts, Georgia, and New Jersey, reporting COVID-19 case data on April 20th, 2020. Risk indices generated from the model using data from May 4th were prospectively validated against outcomes reported on May 11th from 1,021 NHs in California. Methods Model features, pertaining to facility and community characteristics, were obtained from a self-constructed dataset based on multiple public and private sources. The model was assessed via out-of-sample area under the receiver operating characteristic curve (AUC), sensitivity, and specificity in the training (via 10-fold cross-validation) and validation datasets. Results The model’s mean AUC, sensitivity, and specificity over 10-fold cross-validation were 0.729 (95% CI: 0.690-0.767), 0.670 (95% CI: 0.477-0.862), and 0.611 (95% CI: 0.412-0.809), respectively. Prospective out-of-sample validation yielded similar performance measures (AUC: 0.721; sensitivity: 0.622; specificity: 0.713). The strongest predictors of COVID-19 infection were identified as the NH’s county’s infection rate and the number of separate units in the NH; other predictors included the county’s population density, historical Centers of Medicare and Medicaid Services cited health deficiencies, and the NH’s resident density (in persons per 1,000 square feet). Additionally, the NH’s historical percentage of non-Hispanic White residents was identified as a protective factor. Conclusions and Implications A machine-learning model can help quantify and predict NH infection risk. The identified risk factors support the early identification and management of presymptomatic and asymptomatic individuals (e.g., staff) entering the NH from the surrounding community and the development of financially sustainable staff testing initiatives in preventing COVID-19 infection.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    32
    References
    11
    Citations
    NaN
    KQI
    []