Rapid Memory Footprint Access Diagnostics
2020
Footprint and reuse distance measure temporal locality and therefore do not capture the significance of access patterns (spacial locality). A strided access pattern has the largest possible footprint but usually has the best performance. To highlight exposed memory latency, we separate footprint into strided (prefetchable) and irregular (non-prefetchable) access components and calculate the growth rate of each. To rapidly compute these footprint access diagnostics, we present two methods, whole-program and precise. Current footprint analyses can cause 200× or more slowdown with realistic inputs and are therefore impractical. Our whole-program method reduces the overhead to 10% by computing upper bounds, but still yields inter-procedural insight through a call path profile. Our precise method uses additional static analysis and profiling to refine the upper bounds for intra-procedural loop nests. We evaluate our approaches using benchmarks that vary access patterns (strided vs. unpredictable), sparsity (all words in a cache line vs. some), and reuse (varying and repeated accesses per element). Notably, for loop nests with unpredictable accesses, the precise method's accuracy is within 10% of ideal. The whole-program method has sufficient accuracy to diagnose bottlenecks.
Keywords:
- Correction
- Source
- Cite
- Save
- Machine Reading By IdeaReader
34
References
0
Citations
NaN
KQI