Operating and Runtime Systems Challenges for HPC Systems

2017 
Future HPC systems will be characterized by extreme heterogeneity. We will see increasing heterogeneity in virtually every aspect of node architecture from computational engines to memory systems. We will see increasing heterogeneity in applications, including heterogeneity within applications (as previously independent applications are composed to build new applications). We will see increasing heterogeneity in system usage models; in some cases, the HPC system is not the most precious resource being managed. We will also see increasing heterogeneity in the shared services (e.g., storage and visualization systems) that are connected to HPC systems. All of this increasing heterogeneity is certain to create new challenges in the design and implementation of operating and runtime systems. There will be new kinds of resources to manage and many resource management tactics will be invented (and some re-discovered and adapted) to address the new heterogeneity. In essence, we will tacitly agree that the operating and runtime systems need to adapt to enable the inevitable integration of new technologies, applications, usage models, and shared services. While this agreement is critical for our ability to make incremental progress, we, as a community, must step back and ask the relevant question: Does the OS or runtime system bear the brunt of the adaptation, or will we be able to insist on changes in the technologies, applications, and environment? In the past decade, we have seen a similar tradeoff play out between the application teams and the architects of computational engines: how much floating point precision is required and how is this precision implemented? How can we define similar tradeoffs that are important in the design and implementation of operating and runtime systems?
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []