Accelerating Assembly Operation in Element-by-Element FEM on Multicore Platforms

2016 
The speedup of element-by-element FEM algorithms depends not only on the peak processor performance but also on the access time to shared mesh data. Eliminating memory boundness would significantly speedup unstructured mesh computations on hybrid multicore architectures, where the gap between processor and memory performance continues to grow. The speedup can be achieved by ordering unknowns so that only those elements are processed in parallel which do not have common nodes. If vectors are composed with respect to the ordering, memory conflicts will be minimized. Mesh was partitioned into layers by using neighborhood relationship. We evaluated several partitioning schemes (block, odd-even parity, and their modifications) on multicore platforms, using Gunther’s Universal Law of Computational Scalability. We performed numerical experiments with element-by-element matrix-vector multiplication on unstructured meshes on multicore processors accelerated by MIC and GPU. We achieved 5-times speedup on CPU, 40-times — on MIC, and 200-times — on GPU.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    9
    References
    0
    Citations
    NaN
    KQI
    []