Novel Data Imputation for Multiple Types of Missing Data in Intensive Care Units

2019 
The diversity and number of parameters monitored in an intensive care unit (ICU) make the resulting databases highly susceptible to quality issues, such as missing information and erroneous data entry, which adversely affect the downstream processing and predictive modeling. Missing data interpolation and imputation techniques, such as multiple imputation, expectation maximization, and hot-deck imputation techniques do not account for the type of missing data, which can lead to bias. In our study, we first model the missing data as three types: “neglectable” also known as a.k.a “missing completely at random,” “recoverable” a.k.a. “missing at random,” and “not easily recoverable” a.k.a. “missing not at random.” We then design imputation techniques for each type of missing data. We use a publicly available database (MIMIC II) to demonstrate how these imputations perform with random forests for prediction. Our results indicate that these novel imputation techniques outperformed standard mean filling techniques and expectation maximization with a statistical significance p ≤ 0.01 in predicting ICU mortality.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    55
    References
    8
    Citations
    NaN
    KQI
    []