Predicting 30-Day Hospital Readmission for Diabetics Based on Spark

2019 
The hospital readmission rates have increasingly been used as an outcome measures in health services and as a quality benchmark for health systems. In the face of growing medical data rapidly, this paper proposed a distributed Random Forest model based on Spark to predict the 30 day readmission rates of diabetics. Given that the class imbalance will affect the performance of model, the paper implemented distributed Synthetic Minority Over-sampling Technique (SMOTE) algorithm, in which the positive class data was broadcast to all nodes to generate accurate k-nearest neighbor of minority samples, then the distributed Isolation Forest algorithm was developed for removing abnormal samples. Gini index of Random Forest was used to assess the importance of the features that afford assistance support for professionals. The experiments result showed that this model obtained a better classification effect than traditional stand-alone models by balancing class and removing outliers. Moreover, it can scale well with cluster size as it can dramatically decrease the training time and enhance the ability of processing massive medical data by increasing the number of nodes in the cluster.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    15
    References
    3
    Citations
    NaN
    KQI
    []