SciFlow: A dataflow-driven model architecture for scientific computing using Hadoop

2013 
Many computational science applications utilize complex workflow patterns that generate an intricately connected set of output files for subsequent analysis. Some types of applications, such as rare event sampling, additionally require guaranteed completion of all subtasks for analysis, and place significant demands on the workflow management and execution environment. SciFlow is a user interface built over the Hadoop infrastructure that provides a framework to support the complex process and data interactions and guaranteed completion requirements of scientific workflows. It provides an efficient mechanism for building a parallel scientific application with dataflow patterns, and enables the design, deployment, and execution of data intensive, many-task computing tasks on a Hadoop platform. The design principles of this framework emphasize simplicity, scalability and fault-tolerance. A case study using the forward flux sampling rare event simulation application validates the functionality, reliability and effectiveness of the framework.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    35
    References
    12
    Citations
    NaN
    KQI
    []