Task-Based Multifrontal QR Solver for GPU-Accelerated Multicore Architectures

2015 
Recent studies have shown the potential of task-based programming paradigms for implementing robust, scalable sparse direct solvers for modern computing platforms. Yet, designing task flows that efficiently exploit heterogeneous architectures remains highly challenging. In this paper we first tackle the issue of data partitioning using a method suited for heterogeneous platforms. On the one hand, we design task of sufficiently large granularity to obtain a good acceleration factor on GPU. On the other hand, we limit that size in order to both fit the GPU memory constraints and generate enough parallelism in the task graph. Secondly we handle the task scheduling with a strategy capable of taking into account workload and architecture heterogeneity at a reduced cost. Finally we propose an original evaluation of the performance obtained in our solver on a test set of matrices. We show that the proposed approach allows for processing extremely large input problems on GPU-accelerated platforms and that the overall performance is competitive with equivalent state of the art solvers designed and optimized for GPU-only use.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    28
    References
    10
    Citations
    NaN
    KQI
    []