Slicing and Dicing OpenHPC Infrastructure: Virtual Clusters in OpenStack

2019 
University research computing centers are increasingly faced with the need to support applications that are better suited for cloud infrastructure than HPC infrastructure. A common approach is to shoehorn cloud-based applications onto the university's existing HPC system, which has been done with varying levels of success. Another approach as been to create stand-alone HPC systems and private cloud systems, resulting in ineffective use of resources. A more recent approach has been to use hybrid systems where the HPC system "bursts" excess jobs to private cloud nodes configured as bare-metal nodes built from the same (expensive) hardware as the HPC system. This paper explores another model, namely the use of private cloud infrastructure (built from inexpensive commodity networks and storage systems) to host both HPC jobs and VMs simultaneously Utilizing VMs allows these emerging applications to leverage cloud frameworks specifically designed for them (e.g., OpenStack, Kubernetes, Mesos, Hadoop, and Spark), while at the same time effectively supporting a growing percentage of the HPC jobs (e.g., single node jobs, and embarrassingly parallel jobs). Because the system can be constructed from commodity cloud networks and storage, it makes cost-effective use of the resources as opposed to HPC systems used to run jobs that do not use (waste) its expensive resources. To demonstrate the advantages of using cloud infrastructure for both cloud applications and HPC applications, we describe a system that can dynamically launch OpenHPC systems on commodity OpenStack infrastructure. Moreover, users can use the system to deploy "personal" OpenHPC clusters, customized to their application's needs (e.g., number of nodes, cores per node, memory per node). We have used the system to effectively run OpenHPC work-loads on a cluster of large memory OpenStack nodes, allowing users to create, for example, a large memory HPC-style cluster of 500 GB nodes running OpenHPC, and a cluster of 1TB VMs operating simultaneously. Performance degradation due to virtualization has been insignificant, particularly when compared to the advantages of being able to use optimized frameworks running on cost-effective hardware.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []