New Execution Paradigm for Data-Intensive Scientific Workflows

2009 
With the advent of Grid and service-oriented technologies, scientific workflows have been introduced in response to the increasing demand of researchers for assembling diverse, highly-specialized applications, allowing them to exchange large heterogeneous datasets in order to accomplish a complex scientific task. Much research has already been done to provide efficient scientific workflow management systems (WfMS). However, most of such WfMS are coordinating and executing workflows in a centralized fashion. This creates a single point of failure, forms a scalability bottleneck, and often leads to excessive traffic routed back to the coordinator. Additionally, none of the available WfMS provides means for dynamic data transformation between services in order to overcome the data heterogeneity problem. This work presents a new approach for scientific workflow management targeted to provide ways for an efficient distributed execution of data-intensive workflows. The proposed approach reduces the communication traffic between services and overcomes the data heterogeneity problem. Moreover, it allows full control over long-running applications, as well as provides support for smart re-run, distributed fault handling and distributed load balancing.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    12
    References
    3
    Citations
    NaN
    KQI
    []