Performance analysis and comparison of cellular automata GPU implementations

2017 
Cellular automata (CA) models are of interest to several scientific areas, and there is a growing interest in exploring large systems which would need high performance computing. In this work a CA implementation is presented which performs well in five different NVIDIA GPU architectures, from Tesla to Maxwell, simulating systems with up to a billion cells. Using the game of life (GoL) and a more complex variation of GoL as examples, a performance of 5.58e6 evaluated cells/s is achieved. The two optimizations most often used in previous studies are the use of shared memory and Multicell algorithms. Here, these optimizations do not improve performance in Fermi or newer architectures. The GoL CA code running in an NVIDIA Titan X obtained a speedup of up to \(\sim \)85 x and up to \(\sim \)230 x for a more complex CA, compared to an optimized serial CPU implementation. Finally, the efficiency of each GPU is analyzed in terms of cell performance/transistors and cell performance/bandwidth showing how the architectures improved for this particular problem.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    48
    References
    5
    Citations
    NaN
    KQI
    []