High productivity processing - Engaging in big data around distributed computing

2013 
The steadily increasing amounts of scientific data and the analysis of `big data' is a fundamental characteristic in the context of computational simulations that are based on numerical methods or known physical laws. This represents both an opportunity and challenge on different levels for traditional distributed computing approaches, architectures, and infrastructures. On the lowest level data-intensive computing is a challenge since CPU speed has surpassed IO capabilities of HPC resources and on the higher levels complex cross-disciplinary data sharing is envisioned via data infrastructures in order to engage in the fragmented answers to societal challenges. This paper highlights how these levels share the demand for `high productivity processing' of `big data' including the sharing and analysis of `large-scale science data-sets'. The paper will describe approaches such as the high-level European data infrastructure EUDAT as well as low-level requirements arising from HPC simulations used in distributed computing. The paper aims to address the fact that big data analysis methods such as computational steering and visualization, map-reduce, R, and others are around, but a lot of research and evaluations still need to be done to achieve scientific insights with them in the context of traditional distributed computing infrastructures.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    16
    References
    1
    Citations
    NaN
    KQI
    []