Monadic Composition for Deterministic, Parallel Batch Processing

2017 
Achieving determinism on real software systems remains difficult. Everyday code interacts with a wide array of nondeterministic sources including those internal to the system (OS system calls, CPU instructions, other processes), external to the system (time, date, network communication), and arising from software abstractions (concurrency, threaded execution, data races). Nondeterminism complicates many tasks, from achieving reliable software builds across systems, to creating reproducible scientific results. Existing approaches to enforcing determinism assume that source code is available or changing the operating system is possible. Several approaches have high overhead as runtime monitoring causes performance to suffer. In this work we present DetFlow, a framework for writing and deploying new and legacy software which guarantees determinism. DetFlow uses a novel approach combining static, language-level guarantees with a lightweight runtime enforcement system. Applications that leverage DetFlow must have an entrypoint that lives in the DetIO monad, a type which requires all operations—including I/O—be deterministic. Furthermore, DetFlow allows the execution of arbitrary code not written in this framework by executing it in a determinizing runtime. Combining support for deterministic parallelism, filesystem access, and logging, DetFlow is an ideal platform for writing scripted workflows that process large data sets simultaneously. We show several use cases of DetFlow by applying it to bioinformatics data pipelines and software build systems. Our evaluation shows we can determinize existing software with minimal modifications, while preserving performance and exploiting software parallelism. We show that DetFlow makes it easier to discover nondeterminism and data races sooner, as DetFlow forces programmers to get reproducibility and parallelism right from the onset.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []