Towards Power Efficiency in Deep Learning on Data Center Hardware

2019 
Deep learning (DL) is a computationally intensive workload that is expected to grow rapidly in data centers in the near future. Its high energy demand necessitates finding ways to improve computational efficiency. In this work, we directly measure power used by the whole system as well as that used by GPU, CPU, and RAM during DL training to determine their contributions to the overall energy consumption. We find that while GPUs use most of the power – about 70 % - the consumption of other components is also significant and their optimizations can bring important power savings. Evaluating a multitude of options, we identify the parameters that bring in the most power savings. Overall, an energy savings of over 20% of can be obtained by adjusting system settings alone without changing the workload, at the cost of a minor increase in runtime. Alternatively, if runtime needs to stay constant, an 18% energy savings is identified. In distributed multi-server DL, we find that scale-out overhead has only a small energy cost, making distributed training more energy-efficient than expected. Implications for the field and ways to make DL more energy-efficient going forward are also discussed. (Abstract)
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    12
    References
    6
    Citations
    NaN
    KQI
    []