Rapid Memory Footprint Access Diagnostics

2020 
Footprint and reuse distance measure temporal locality and therefore do not capture the significance of access patterns (spacial locality). A strided access pattern has the largest possible footprint but usually has the best performance. To highlight exposed memory latency, we separate footprint into strided (prefetchable) and irregular (non-prefetchable) access components and calculate the growth rate of each. To rapidly compute these footprint access diagnostics, we present two methods, whole-program and precise. Current footprint analyses can cause 200× or more slowdown with realistic inputs and are therefore impractical. Our whole-program method reduces the overhead to 10% by computing upper bounds, but still yields inter-procedural insight through a call path profile. Our precise method uses additional static analysis and profiling to refine the upper bounds for intra-procedural loop nests. We evaluate our approaches using benchmarks that vary access patterns (strided vs. unpredictable), sparsity (all words in a cache line vs. some), and reuse (varying and repeated accesses per element). Notably, for loop nests with unpredictable accesses, the precise method's accuracy is within 10% of ideal. The whole-program method has sufficient accuracy to diagnose bottlenecks.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    34
    References
    0
    Citations
    NaN
    KQI
    []