High-performance code optimizations for mobile devices

2019 
Mobile devices have seen their performance increased in latest years due to improvements on System on Chip technologies. These shared memory systems now integrate multicore CPUs and accelerators, and obtaining the optimal performance from such heterogeneous architectures requires making use of accelerators in an efficient way. Graphics Processing Units (GPUs) are accelerators that often outperform multicore CPUs in data-parallel workloads by orders of magnitude, so their use for image processing applications on mobile devices is very important. In this work we explore tiling code optimizations for GPU applications running on mobile devices. A dynamic adaptive tile size selection methodology is created, which allows finding at runtime close-to-optimal parameterizations independently of the underlying architecture. Results demonstrate the performance benefits of these optimizations over a set of stencil-based image processing benchmarks.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    16
    References
    1
    Citations
    NaN
    KQI
    []