From CPU to FPGA — Acceleration of self-organizing maps for data mining

2017 
Big data and machine learning applications are posing steadily increasing challenges to the used compute platforms in terms of performance and energy efficiency. In this paper we utilize the highly scalable heterogeneous server platform RECS for evaluation of a wide variety of hardware platforms ranging from general purpose CPUs via ARM-based SoCs to GPGPUs and FPGAs. The self-organizing map, a popular neural network model for unsupervised clustering and dimensionality reduction, is used as a typical example for machine learning applications in the big data domain. Optimized implementations of the algorithm have been developed for each of the target architectures. An in-depth analysis of the achieved performance and energy efficiency for a wide variety of application parameters shows that no single architecture performs best in terms of energy efficiency for the complete design space. In our study, ARM-based SoCs achieved the highest efficiency for small network sizes while FPGAs and GPGPUs perform best for large data sets. Compared to an implementation based on the Matlab SOM toolbox, our optimized multi-threaded CPU implementation achieves two orders of magnitude higher performance and energy efficiency. Large simulations especially benefit from the FPGA implementation, which outperforms the optimized CPU implementation by a factor of 220 and provides a 28-times higher energy efficiency.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    29
    References
    8
    Citations
    NaN
    KQI
    []