ExBLAS: Reproducible and Accurate BLAS Library

2015 
Due to non-associativity of floating-point operations and dynamic scheduling on parallel architectures, getting a bit-wise reproducible floating-point result for multiple executions of the same code on different or even similar parallel architectures is challenging. We address the problem of reproducibility in the context of fundamental linear algebra operations – like the ones included in the BLAS library – and propose algorithms that yield both reproducible and accurate results (correct rounding, except for triangular solver). We present implementations of these algorithms for the BLAS routines along with the performance results in parallel environments such as Intel desktop and server CPUs, Intel Xeon Phi, and both NVIDIA and AMD GPUs.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    8
    References
    17
    Citations
    NaN
    KQI
    []