Design and Implementation of Kepler Workflows for BioEarth

2014 
Abstract BioEarth is an ongoing research initiative for the development of a regional-scale Ea rth S ystem M odel (EaSM) for the U.S. Pacific Northwest. Our project seeks to couple and integrate multiple stand-alone EaSMs developed through independent efforts for capturing natural and human processes in various realms of the biosphere: atmosphere (weather and air quality), terrestrial biota (crop, rangeland, and forest agro-ecosystems) and aquatic (river flows, water quality, and reservoirs); hydrology links all these realms. Due to the need to manage numerous complex simulations,an application of automated workflows was essential. In this paper, we present a case study of workflow design for the BioEarth project using the Kepler system to manage applications of the R egional H ydro- E cologic S imulation Sys tem (RHESSys) model. In particular, we report on the design of Kepler workflows to support: 1) standalone executions of the RHESSys model under serial and parallel applications, and 2) a more complex case of performing calibration runs involving multiple preprocessing modules, iterative exploration of parameters and parallel RHESSys executions. We exploited various Kepler features including a user-friendly design interface and support for parallel execution on a cluster. Our experiments show a performance speedup between 7–12x, using 16 cores of a Linux cluster, and demonstrate the general effectiveness of our Kepler workflows in managing RHESSys runs. This study shows the potential of Kepler to serve as the primary integration platform for the BioEarth project, with implications for other data- and compute-intensive Earth systems modeling projects.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    19
    References
    5
    Citations
    NaN
    KQI
    []