Exploring Data Streaming to Improve 3D FFT Implementation on Multiple GPUs

Cleomar Pereira da Silva,Leandro Fontoura Cupertino,Daniel Salles Chevitarese,Marco Aurélio Cavalcanti Pacheco,Cristiana Bentes

Exploring Data Streaming to Improve 3D FFT Implementation on Multiple GPUs

2010

FFT is a well known and widely used algorithm in many scientific and engineering applications. However, FFT is a memory-bound problem that still presents performance challenges to new generations of computer architectures due to its relatively low ratio of computation per memory access. For GPU architectures, where the data transfers between the host CPU memory and the device memory is very expensive, the memory overhead can become a huge bottleneck for large size problems. In this work, we propose an efficient parallel implementation of FFT on multiple GPUs that tackles the overhead of host memory access, by implementing a streaming scheme that hides the data transfer latency. The idea is to divide the problem into smaller ones, generating several lighter and asynchronous memory transfers from host to device enabling the computation for those data simultaneously. We obtained an acceleration of approximately 60% over the non streamed GPU implementation.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations