GPU-Accelerated Generation of Correctly Rounded Elementary Functions

2017 
The IEEE 754-2008 standard recommends the correct rounding of some elementary functions. This requires solving the Table Maker’s Dilemma (TMD), which implies a huge amount of CPU computation time. In this article, we consider accelerating such computations, namely the Lefevre algorithm on graphics processing units (GPUs), which are massively parallel architectures with a partial single instruction, multiple data execution. We first propose an analysis of the Lefevre hard-to-round argument search using the concept of continued fractions. We then propose a new parallel search algorithm that is much more efficient on GPUs thanks to its more regular control flow. We also present an efficient hybrid CPU-GPU deployment of the generation of the polynomial approximations required in the Lefevre algorithm. In the end, we manage to obtain overall speedups up to 53.4 × on one GPU over a sequential CPU execution and up to 7.1 × over a hex-core CPU, which enable a much faster solution of the TMD for the double-precision format.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    25
    References
    0
    Citations
    NaN
    KQI
    []