Automatic cluster parallelization and minimizing communication via selective data replication

2015 
The technology scaling has initiated two distinct trends that are likely to continue into future: first, the increased parallelism in hardware and second, the increasing performance and energy cost of communication relative to computation. Both of the above trends call for development of compiler and runtime systems to automatically parallelize programs and reduce communication in parallel computations to achieve the desired high performance in an energy-efficient fashion. In this paper, we propose the design of an integrated compiler and runtime system that auto-parallelizes loop-nests to clusters and, a novel communication avoidance method that reduces data movement between processors. Communication minimization is achieved via data replication: data is replicated so that a larger share of the whole data set may be mapped to a processor and hence, non-local memory accesses reduced. Experiments on a number of benchmarks show the effectiveness of the approach.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    15
    References
    1
    Citations
    NaN
    KQI
    []