Locality-Aware Work Stealing on Multi-CPU and Multi-GPU Architectures

Thierry Gautier,Joao Vicente Ferreira Lima,Nicolas Maillard,Bruno Raffin

Locality-Aware Work Stealing on Multi-CPU and Multi-GPU Architectures

2013

Thierry Gautier
Joao Vicente Ferreira Lima
Nicolas Maillard
Bruno Raffin

Most recent HPC platforms have heterogeneous nodes com- posed of a combination of multi-core CPUs and accelerators, like GPUs. Scheduling on such architectures relies on a static partitioning and cost model. In this paper, we present a locality-aware work stealing scheduler for multi-CPU and multi-GPU architectures, which relies on the XKaapi runtime system. We show performance results on two dense linear algebra kernels, Cholesky (POTRF) and LU (GETRF) factorization, to evaluate our scheduler on a heterogeneous architecture composed of two hexa-core CPUs and eight NVIDIA Fermi GPUs. Our experiments show that an online locality-aware scheduling achieve performance results as good as static strategies, and in most cases outperform them.

Keywords:

Parallel computing
Scheduling (computing)
CUDA
Cholesky decomposition
Linear algebra
Factorization
Architecture
Work stealing
Runtime system
Computer science
Locality

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations