CUDA M3: Designing Efficient CUDA Managed Memory-Aware MPI by Exploiting GDR and IPC

2016 
CUDA 6.0 introduced Managed Memory feature to boost the productivity on GPU systems. It removes the explicit memory management and data movement between the host and the accelerator burden from the programmer. However, these benefits restrict the pinning of the memory and hence limits its performance by depriving the usage of performance-centric features like CUDA-IPC and GPUDirect RDMA. On another hand, CUDA-Aware MPI runtimes, have been continuously improving the performance of data movement from/to native GPU memory allocations. In this paper, to maximize the productivity and performance potentials on GPU systems, we propose a novel CUDA M3 framework. We investigate and propose efficient designs to introduce Managed Memory Awareness to CUDA-Aware MPI. To do so, we analyze the behavior of managed memory and define a locality property. We propose novel schemes to optimize intra-node and inter-node communications using CUDA-IPC and GDR features. To the best of our knowledge, this is the first work to design Managed Memory-Aware data movement schemes that exploit CUDA-IPC and GDR features. The performance evaluation, with micro-benchmark, using a CS-Storm system, shows up to 32X improvement for intra-node configuration and up to 7X for inter-node configuration. Using a real world application, our designs show up to 1.92X improvement on 96 GPUs with GPULBM application.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    17
    References
    3
    Citations
    NaN
    KQI
    []