CUDA M3: Designing Efficient CUDA Managed Memory-Aware MPI by Exploiting GDR and IPC

Khaled Hamidouche,Ammar Ahmad Awan,Akshay Venkatesh,Dhabaleswar K. Panda

CUDA M3: Designing Efficient CUDA Managed Memory-Aware MPI by Exploiting GDR and IPC

2016

CUDA 6.0 introduced Managed Memory feature to boost the productivity on GPU systems. It removes the explicit memory management and data movement between the host and the accelerator burden from the programmer. However, these benefits restrict the pinning of the memory and hence limits its performance by depriving the usage of performance-centric features like CUDA-IPC and GPUDirect RDMA. On another hand, CUDA-Aware MPI runtimes, have been continuously improving the performance of data movement from/to native GPU memory allocations. In this paper, to maximize the productivity and performance potentials on GPU systems, we propose a novel CUDA M3 framework. We investigate and propose efficient designs to introduce Managed Memory Awareness to CUDA-Aware MPI. To do so, we analyze the behavior of managed memory and define a locality property. We propose novel schemes to optimize intra-node and inter-node communications using CUDA-IPC and GDR features. To the best of our knowledge, this is the first work to design Managed Memory-Aware data movement schemes that exploit CUDA-IPC and GDR features. The performance evaluation, with micro-benchmark, using a CS-Storm system, shows up to 32X improvement for intra-node configuration and up to 7X for inter-node configuration. Using a real world application, our designs show up to 1.92X improvement on 96 GPUs with GPULBM application.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations