Floating-Point Arithmetic Using GPGPU on FPGAs

2017 
This paper presents a new tool flow to realize algorithms in floating-point precision on FPGAs. A customizable multicore soft GPU architecture is used on the hardware side. Two solutions to perform floating-point arithmetic in IEEE-754 single precision are investigated: using standard function calls to GPU-friendly software implementations, or by hardware upgrades to the Processing Elements (PEs). An OpenCL compiler that supports both approaches has been developed. An IPython API is used to evaluate 15 benchmarks. The suggested tool flow has been compared to many other solutions including High Level Synthesis (HLS): on average, our architecture has 2.9x better compute density and 11.2x better energy efficiency than a single MicroBlaze processor, or any homogeneous Multi-Processor System On Chip (MPSoC) based on it. In addition, speedups up to 22x can be achieved over an ARM Cortex-A9 supported by a NEON vector coprocessor. For the most complex benchmarks, where software-similar implementations are used with HLS, the suggested approach performed always better. When some task parameters were not fixed at synthesis time, our architecture could provide better throughput per area and it consumed less energy than HLS in most cases.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    5
    References
    3
    Citations
    NaN
    KQI
    []