Performance Evaluation of a 3D-Stencil Library for Distributed Memory Array Accelerators

2014 
EMAX: Energy-aware Multimode Accelerator Extension is equipped with distributed single-port local memories and ring-formed interconnections. The accelerator is designed to achieve extremely high throughput for scientific computations, big data and image processing and also to achieve low power consumption. However, before mapping algorithms on the accelerator, application developers should have sufficient knowledge of the hardware organization and specially designed instructions. They will, furthermore, need to make significant efforts to tune the code for improving execution efficiency, in the case that no well-designed compiler or library is available. To address this problem, we focus especially on library support for the stencil (nearest-neighbor) computations, which represent a class of algorithms popularly used in many partial differential equation (PDE) solvers. In this research, we take up the following topics: (1) System configuration, features and mnemonics of EMAX, (2) Instruction mapping techniques that can reduce the amount of data to be read from the main memory, (3) Performance evaluation of the library for PDE solvers. With the features of the library that can reuse the local data across the outer loop iterations and can map many instructions by unrolling outer loops, the amount of data to be read from main memory is significantly reduced to a minimum of 1/7 compared with a hand-tuned code. In addition, the stencil library was found capable of reducing 23% of the execution time compared with a general purpose processor.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    3
    References
    2
    Citations
    NaN
    KQI
    []