Data Imputation with an Improved Robust and Sparse Fuzzy K-Means Algorithm

2019 
Missing data may be one of the biggest problems hindering modern research science. It occurs frequently, for various reasons, and slows down crucial data analytics required to answer important questions related to global issues like climate change and water management. The modern answer to this problem of missing data is data imputation. Specifically, data imputation with advanced machine learning techniques. Unfortunately, an approach with demonstrable success for accurate imputation, Fuzzy K-Means Clustering, is famously slow compared to other algorithms. This paper aims to remedy this foible of such a promising method by proposing a Robust and Sparse Fuzzy K-Means algorithm that operates on multiple GPUs. We demonstrate the effectiveness of our implementation with multiple experiments, clustering real environmental sensor data. These experiments show that the our improved multi-GPU implementation is significantly faster than sequential implementations with 185 times speedup over 8 GPUs. Experiments also indicated greater than 300x increase in throughput with 8 GPUs and 95% efficiency with two GPUs compared to one.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    19
    References
    1
    Citations
    NaN
    KQI
    []