Scalability of parallel finite element algorithms on multi-core platforms

2016 
The speedup of element-by-element FEM algorithms depends not only on peak processor performance but also on access time to shared mesh data. Eliminating memory boundness would significantly speed up unstructured mesh computations on hybrid multi-core architectures, where the gap between processor and memory performance continues to grow. The speedup can be achieved by ordering unknowns so that only those elements are processed in parallel which do not have common nodes. Therefore, memory conflicts are minimized. FEM assembly is performed with respect to the ordering, which defines how to compose vectors. Mesh can be partitioned into disjoint subdomains by using different layer-by-layer schemes. In this work, we evaluated several partitioning schemes (block, odd, even, and their modifications) on multi-core platforms, using Gunther's Universal Law of Computational Scalability. We performed numerical experiments with element-by-element matrix-vector multiplication on unstructured meshes on multi-core processors accelerated by MIC and GPU. With ordering, we achieved 5-times speedup on CPU, 40-times speedup on MIC, and 200- times speedup on GPU.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    6
    References
    0
    Citations
    NaN
    KQI
    []