Missing values imputation based on fuzzy C-Means algorithm for classification of chronic obstructive pulmonary disease (COPD)

Kiki Aristiawati,Titin Siswantining,Devvi Sarwinda,Saskya Mary Soemartojo

Missing values imputation based on fuzzy C-Means algorithm for classification of chronic obstructive pulmonary disease (COPD)

2019

Chronic Obstructive Pulmonary Disease (COPD) is one of the most causes of death in the world. World Health Organization (WHO) reported that in 2016 COPD was the third leading cause of death worldwide with around 3 million deaths, equivalent to 5.2% of deaths worldwide. For this reason, further research needs to be done on CPOD. Unfortunately, the data collected in the study does not contain all the desired data, is called as a missing value. Missing value is a problem for all types of data analysis. Several ways that can be applied to handle missing value, by filtering data (ignore or remove data) and imputing data. Ignoring or removing data can reduce the amount of information contained in the data and can cause low accuracy to generate from the data analysis process. To overcome this problem, imputation data will be carried out at the preprocessing stage to obtain complete data which is expected to increase the accuracy of the data analysis performed. Many imputations method can be used, such as mean imputation and Fuzzy C-Means (FCM). Fuzzy C-Means is a clustering method that allows one part of the data to belong to two or more groups based on their membership function. The complete dataset was trained with Decision Tree classifier to observe the performance in terms of accuracy for mean and FCM method. The analysis of proposed imputation on classification shows that FCM slightly accurate compare to mean imputation method.Chronic Obstructive Pulmonary Disease (COPD) is one of the most causes of death in the world. World Health Organization (WHO) reported that in 2016 COPD was the third leading cause of death worldwide with around 3 million deaths, equivalent to 5.2% of deaths worldwide. For this reason, further research needs to be done on CPOD. Unfortunately, the data collected in the study does not contain all the desired data, is called as a missing value. Missing value is a problem for all types of data analysis. Several ways that can be applied to handle missing value, by filtering data (ignore or remove data) and imputing data. Ignoring or removing data can reduce the amount of information contained in the data and can cause low accuracy to generate from the data analysis process. To overcome this problem, imputation data will be carried out at the preprocessing stage to obtain complete data which is expected to increase the accuracy of the data analysis performed. Many imputations method can be used, such as mean im...

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations