High-performance remote access to climate simulation data: a challenge problem for data grid technologies

2003 
In numerous scientific disciplines, terabyte and petabyte-scale data collections are emerging as critical community resources. A new class of "data grid" infrastructure is required to support management, transport, distributed access to, and analysis of these datasets by potentially thousands of users. Researchers who face this challenge include the climate modeling community, which performs long-duration computations accompanied by frequent output of very large files that must be further analyzed. We describe the Earth System Grid-I prototype, which brings together advanced analysis, replica management, data transfer, request management, and other technologies to support high-performance, interactive analysis of replicated data. We present performance results that demonstrate our ability to manage the location and movement of large datasets from the user's desktop. We report on experiments conducted over SciNET at SC'2000, where we achieved peak performance of 1.55 Gb/s and sustained performance of 512.9 Mb/s for data transfers between Texas and California. Finally, we describe the development of the next-generation Earth System Grid-II (ESG-II) project. Important issues for ESG-II include security requirements for production environments, efficient data filtering and transport, metadata services for discovery of relevant climate datasets, and sophisticated request or workflow management for complex tasks.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    29
    References
    67
    Citations
    NaN
    KQI
    []