Improving GPGPU Performance via Cache Locality Aware Thread Block Scheduling

Li-Jhan Chen,Hsiang-Yun Cheng,Po-Han Wang,Chia-Lin Yang

Improving GPGPU Performance via Cache Locality Aware Thread Block Scheduling

2017

Modern GPGPUs support the concurrent execution of thousands of threads to provide an energy-efficient platform. However, the massive multi-threading of GPGPUs incurs serious cache contention, as the cache lines brought by one thread can easily be evicted by other threads in the small shared cache. In this paper, we propose a software-hardware cooperative approach that exploits the spatial locality among different thread blocks to better utilize the precious cache capacity. Through dynamic locality estimation and thread block scheduling, we can capture more performance improvement opportunities than prior work that only explores the spatial locality between consecutive thread blocks. Evaluations across diverse GPGPU applications show that, on average, our locality-aware scheduler provides 25 and 9 percent performance improvement over the commonly-employed round-robin scheduler and the state-of-the-art scheduler, respectively.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations