Scalable and Fault Tolerant Platform for Distributed Learning on Private Medical Data

2017 
Medical image data is naturally distributed among clinical institutions. This partitioning, combined with security and privacy restrictions on medical data, imposes limitations on machine learning algorithms in clinical applications, especially for small and newly established institutions. We present InsuLearn: an intuitive and robust open-source (open-source code available at: https://github.com/DistributedML/InsuLearn) platform designed to facilitate distributed learning (classification and regression) on medical image data, while preserving data security and privacy. InsuLearn is built on ensemble learning, in which statistical models are developed at each institution independently and combined at secure coordinator nodes. InsuLearn protocols are designed such that the liveness of the system is guaranteed as institutions join and leave the network. Coordination is implemented as a cluster of replicated state machines, making it tolerant to individual node failures. We demonstrate that InsuLearn successfully integrates accurate models for horizontally partitioned data while preserving privacy.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    13
    References
    3
    Citations
    NaN
    KQI
    []