Performance Portability Strategies for Grid C++ Expression Templates

Peter A. Boyle,M. A. Clark,Carleton DeTar,Meifeng Lin,Verinder Rana,Alejandro Vaquero Avilés-Casco

Performance Portability Strategies for Grid C++ Expression Templates

2017

Peter A. Boyle
M. A. Clark
Carleton DeTar
Meifeng Lin
Verinder Rana
Alejandro Vaquero Avilés-Casco

One of the key requirements for the Lattice QCD Application Development as part of the US Exascale Computing Project is performance portability across multiple architectures. Using the Grid C ++ expression template as a starting point, we report on the progress made with regards to the Grid GPU offloading strategies. We present both the successes and issues encountered in using CUDA, OpenACC and Just-In-Time compilation. Experimentation and performance on GPUs with a SU(3)×SU(3) streaming test will be reported. We will also report on the challenges of using current OpenMP 4.x for GPU offloading in the same code.

Keywords:

Parallel computing
Computer architecture
CUDA
Expression templates
Software portability
Lattice QCD
Grid
Exascale computing

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations