Scientific software management in real life: deployment of easybuild on a large scale system

2016 
Managing scientific software stacks has traditionally been a manual task that required a sizeable team with knowledge about the specifics of building each application. Keeping the software stack up to date also caused a significant overhead for system administrators as well as support teams. Furthermore, a flat module view and the manual creation of modules by different members of the teams can end up providing a confusing view of the installed software to end users. In addition, on many HPC clusters the OS images have to include auxiliary packages to support components of the scientific software stack, potentially bloating the images of the cluster nodes and restricting the installation of new software to a designated maintenance window.To alleviate this situation, tools like EasyBuild help to manage a large number of scientific software packages in a structured way, decoupling the scientific stack from the OS-provided software and lowering the overall overhead of managing a complex HPC software infrastructure. However, the relative novelty of these tools and the variety of requirements from both users and HPC sites means that such frameworks still have to evolve and adapt to different environments. In this paper, we report on how we deployed EasyBuild in a cluster with 45K+ cores (JURECA). In particular, we discuss which features were missing in order to meet our requirements, how we implemented them, how the installation, upgrade, and retirement of software is managed, and how this approach is reused for other internal systems. Finally, we outline some enhancements we would like to see implemented in our setup and in EasyBuild in the future.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    6
    References
    3
    Citations
    NaN
    KQI
    []