High Performance by Exploiting Information Locality through Reverse Computing

Mouad Bahi,Christine Eisenbeis

High Performance by Exploiting Information Locality through Reverse Computing

2011

Mouad Bahi
Christine Eisenbeis

In this paper we present performance results for our register rematerialization technique based on reverse recomputing. Rematerialization adds instructions and we show on one specifically designed example that reverse computing alleviates the impact of these additional instructions on performance. We also show how thread parallelism may be optimized on GPUs by performing register allocation with reverse recomputing that increases the number of threads per Streaming Multiprocessor (SM). This is done on the main kernel of Lattice Quantum Chromo Dynamics (LQCD) simulation program where we gain a 10.84% speedup.

Keywords:

Register allocation
Computer science
Speedup
Real-time computing
Parallel computing
Thread (computing)
Task parallelism
Instruction-level parallelism
Rematerialization
Reversible computing
Instruction set
Multiprocessing

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations