Distributed Training Large-Scale Deep Architectures

Shang-Xuan Zou,Chun-Yen Chen,Jui-Lin Wu,Chun-Nan Chou,Chia-Chin Tsao,Kuan-Chieh Tung,Ting-Wei Lin,Cheng-lung Sung,Edward Y. Chang

Distributed Training Large-Scale Deep Architectures

2017

Shang-Xuan Zou
Chun-Yen Chen
Jui-Lin Wu
Chun-Nan Chou
Chia-Chin Tsao
Kuan-Chieh Tung
Ting-Wei Lin
Cheng-lung Sung
Edward Y. Chang

Scale of data and scale of computation infrastructures together enable the current deep learning renaissance. However, training large-scale deep architectures demands both algorithmic improvement and careful system configuration. In this paper, we focus on employing the system approach to speed up large-scale training. Taking both the algorithmic and system aspects into consideration, we develop a procedure for setting mini-batch size and choosing computation algorithms. We also derive lemmas for determining the quantity of key components such as the number of GPUs and parameter servers. Experiments and examples show that these guidelines help effectively speed up large-scale deep learning training.

Keywords:

Computer engineering
Machine learning
Data parallelism
Performance tuning
Speedup
CUDA
Computer science
Computation
Artificial neural network
Deep learning
Artificial intelligence
Server

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations