On the Memory Underutilization: Exploring Disaggregated Memory on HPC Systems

2020 
Large-scale high-performance computing (HPC) systems consist of massive compute and memory resources tightly coupled in nodes. We perform a large-scale study of memory utilization on four production HPC clusters. Our results show that more than 90% of jobs utilize less than 15% of the node memory capacity, and for 90% of the time, memory utilization is less than 35%. Recently, disaggregated architecture is gaining traction because it can selectively scale up a resource and improve resource utilization. Based on these observations, we explore using disaggregated memory to support memory-intensive applications, while most jobs remain intact on HPC systems with reduced node memory. We designed and developed a user-space remote-memory paging library to enable applications exploring disaggregated memory on existing HPC clusters. We quantified the impact of access patterns and network connectivity in benchmarks. Our case studies of graph-processing and Monte-Carlo applications evaluated the impact of application characteristics and local memory capacity and highlighted the potential of throughput scaling on disaggregated memory.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    18
    References
    3
    Citations
    NaN
    KQI
    []