Gregarious Data Re-structuring in a Many Core Architecture

2015 
As new massively multithreaded many-core architectural designs continue to evolve, the challenge of finding schedules that exploit concurrency, reuse and locality remains. Classically, data transformations were made with a limited view of both the memory hierarchy and the parallelism available to the machine. However, as the multithreaded designs become more complex, the machine resources tend to get "grouped" or shared in more sophisticated arrangements (e.g. distributed Level 2 caches serving collectively as a Level 3 cache or a large number of simultaneous multithreading). These new configurations present new optimization opportunities that the software tool chains might not be aware and, therefore, miss altogether. In this paper, we have developed a new methodology that takes in consideration the access patterns from a single parallel actor (e.g. a thread), as well as, the access patterns of "grouped" parallel actors that share a resource (e.g. a distributed Level 3 cache). We start with a hierarchical tile code for our target machine and apply a series of transformations at the tile level to improve data residence in a given memory hierarchy level. The contribution of this paper includes (a) collaborative data restructuring for group reuse and (b) low overhead transformation technique to improve access pattern and bring closely connected data elements together. Preliminary results in a many core architecture, TileraTileGX, shows promising improvements over optimized OpenMPcode (up to 31% increase in GFLOPS) and over our own previous work on fine grained runtimes (up to 16%) for selected kernels.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    26
    References
    0
    Citations
    NaN
    KQI
    []