Conjugate gradient solvers on Intel Xeon Phi and NVIDIA GPUs

2014 
Lattice Quantum Chromodynamics simulations typically spend most of theruntime in inversions of the Fermion Matrix. This part is therefore frequentlyoptimized for various HPC architectures. Here we compare the performance of theIntel Xeon Phi to current Kepler-based NVIDIA Tesla GPUs running a conjugategradient solver. By exposing more parallelism to the accelerator throughinverting multiple vectors at the same time, we obtain a performance greaterthan 300 GFlop/s on both architectures. This more than doubles the performanceof the inversions. We also give a short overview of the Knights Cornerarchitecture, discuss some details of the implementation and the effortrequired to obtain the achieved performance.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    8
    References
    8
    Citations
    NaN
    KQI
    []