Impact of selected pre-processing techniques on prediction of risk of early readmission for diabetic patients in India

2016 
Diabetes is associated with increased risk of hospital readmission. Predicting risk of readmission of diabetic patients can facilitate implementing appropriate plans to prevent these readmissions. But the real-world medical data is noisy, inconsistent, and incomplete. So before building the prediction model, it is essential to pre-process the data efficiently and make it appropriate for predictive modelling. The objective of this study is to assess the impact of selected pre-processing techniques on the prediction of risk of 30-day readmission among patients with diabetes in India. De-identified electronic medical records data was used from a reputed hospital in the National Capital Region in India and included diabetes patients ≥18 years old discharged from hospital in 2012 to 2015 (n = 9381). This paper focused on data pre-processing steps to improve readmission prediction outcomes. The impact of different pre-processing choices including feature selection, missing value imputation and data balancing on the classifier performance of logistic regression, Naive Bayes, and decision tree was assessed on various performance metrics such as area under curve, precision, recall, and accuracy. This comprehensive experimental study, first time done from Indian healthcare perspective, offered empirical evidence that most proposed models with pre-processing techniques significantly outperform the baseline methods (without any pre-processing) with respect to selected evaluation criteria. Area under curve (AUC) was highly increased with the use of oversampling technique as data is skewed on class label Readmission. Recall was the biggest gainer with range increasing from 0.02–0.23 to 0.78–0.85, and there was also an increase in AUC from range 0.56–0.68 to 0.83–0.86 by using pre-processing approach. Data pre-processing has a significant effect on hospital readmission predictive accuracy for patients with diabetes, with certain schemes proving inferior to competitive approaches. In addition, it is found that the impact of pre-processing schemes varies by technique, signifying formulation of different best practices to aid better results of a specific technique.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    31
    References
    13
    Citations
    NaN
    KQI
    []