Conjugate gradient solvers on Intel Xeon Phi and NVIDIA GPUs

Olaf Kaczmarek,Christian Joachim Schmidt,Patrick Steinbrecher,Mathias Wagner

Conjugate gradient solvers on Intel Xeon Phi and NVIDIA GPUs

2014

Olaf Kaczmarek
Christian Joachim Schmidt
Patrick Steinbrecher
Mathias Wagner

Lattice Quantum Chromodynamics simulations typically spend most of theruntime in inversions of the Fermion Matrix. This part is therefore frequentlyoptimized for various HPC architectures. Here we compare the performance of theIntel Xeon Phi to current Kepler-based NVIDIA Tesla GPUs running a conjugategradient solver. By exposing more parallelism to the accelerator throughinverting multiple vectors at the same time, we obtain a performance greaterthan 300 GFlop/s on both architectures. This more than doubles the performanceof the inversions. We also give a short overview of the Knights Cornerarchitecture, discuss some details of the implementation and the effortrequired to obtain the achieved performance.

Keywords:

Xeon Phi
Parallel computing
CUDA
Mathematical software
Solver
Computational science
Conjugate gradient method
Matrix (mathematics)
Computer science
Lattice field theory
conjugate gradient solver

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations