rmalloc() and rpipe(): a uGNI-based Distributed Remote Memory Allocator and Access Library for One-sided Messaging

Udayanga Wickramasinghe,Andrew Lumsdaine

rmalloc() and rpipe(): a uGNI-based Distributed Remote Memory Allocator and Access Library for One-sided Messaging

2018

Optimizing communication is essential for high-performance computing because synchronization bottlenecks inhibit the overall performance and scalability of parallel applications. Today's cutting-edge computing hardware, as well as networking interfaces like Cray Aries/Gemini, features extremely low latency and high bandwidth remote memory access (RMA) operations for optimized data movement. However for any efficient data movement to occur between two logical processing units, software substrates must be able to properly exploit hardware resources for the underlying fabric. Overheads due to coarse granular synchronization and stalls during irregular access of remote memory regions may hint at two adverse effects of resource under-utilization in time and space. We introduce a uGNI-based distributed remote memory allocator called "rmalloc" which expands RDMA-enabled memory utilization, and a communication substrate called "rpipe" that tries to mitigate synchronization bottlenecks. Our UNIX-inspired RMA programming model is simple to use and equally applicable to both higher-level applications as well as lower-level runtime systems for enabling efficient data movement. Our micro-benchmark results suggest that "rmalloc" default next-fit allocator outperforms MPI-3.0 RMA by 1.5X and up to 6X in most cases, while other variants of "rmalloc" (i.e. best-fit, worst-fit) reduce external fragmentation and perform comparably or better than the default "rmalloc" allocator for irregular RMA.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations