Predicting the Survival of Heart Failure Patients in Unbalanced Data Sets

2021 
Heart failure is a serious, cardiovascular condition that affects the lives of millions of people. Early diagnosis of this disease is extremely important in the treatment of the disease. Survival analysis of sick individuals gives us information for early diagnosis and treatment. Survival analysis of heart failure patients was performed within the scope of the study. Using the Correlation Matrix and Random Forest methods, the most relevant characteristics to death status were determined as serum creatinine, ejection fraction and age. Patient follow-up time was ignored because it was not known when performing the survival analysis. Resampling methods were applied due to the uneven class distribution in the data set. It has been shown in experimental studies that when data cleaning is applied together with resampling, the prediction success is higher. It was determined that eliminating the class imbalance in the data set increased the success of the classifier. While the oversampling method showed a better success on the Random Forest algorithm with %84.51, the undersampling method showed a higher success on the Extra Trees algorithm with %84.58.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    7
    References
    1
    Citations
    NaN
    KQI
    []