Locality-Aware Work Stealing on Multi-CPU and Multi-GPU Architectures

2013 
Most recent HPC platforms have heterogeneous nodes com- posed of a combination of multi-core CPUs and accelerators, like GPUs. Scheduling on such architectures relies on a static partitioning and cost model. In this paper, we present a locality-aware work stealing scheduler for multi-CPU and multi-GPU architectures, which relies on the XKaapi runtime system. We show performance results on two dense linear algebra kernels, Cholesky (POTRF) and LU (GETRF) factorization, to evaluate our scheduler on a heterogeneous architecture composed of two hexa-core CPUs and eight NVIDIA Fermi GPUs. Our experiments show that an online locality-aware scheduling achieve performance results as good as static strategies, and in most cases outperform them.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    20
    References
    12
    Citations
    NaN
    KQI
    []