CUDA-DTM: Distributed Transactional Memory for GPU Clusters

Samuel Irving,Sui Chen,Lu Peng,Costas Busch,Maurice Herlihy,Chris J. Michael

CUDA-DTM: Distributed Transactional Memory for GPU Clusters

2019

We present CUDA-DTM, the first ever Distributed Transactional Memory framework written in CUDA for large scale GPU clusters. Transactional Memory has become an attractive auto-coherence scheme for GPU applications with irregular memory access patterns due to its ability to avoid serializing threads while still maintaining programmability. We extend GPU Software Transactional Memory to allow threads across many GPUs to access a coherent distributed shared memory space and propose a scheme for GPU-to-GPU communication using CUDA-Aware MPI. The performance of CUDA-DTM is evaluated using a suite of seven irregular memory access benchmarks with varying degrees of compute intensity, contention, and node-to-node communication frequency. Using a cluster of 256 devices, our experiments show that GPU clusters using CUDA-DTM can be up to 115x faster than CPU clusters.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations