DELAYED WEIGHT UPDATE FOR FASTER CONVERGENCE IN DATA-PARALLEL DEEP LEARNING

Tetsuya Youkawa,Haruki Mori,Yuki Miyauchi,Kazuki Yamada,Shintaro Izumi,Masahiko Yoshimoto,Hiroshi Kawaguchi

DELAYED WEIGHT UPDATE FOR FASTER CONVERGENCE IN DATA-PARALLEL DEEP LEARNING

2018

This paper presents a proposal of a data-parallel stochastic gradient descent (SGD) using delayed weight update. A large-scale neural network appears to solve advanced problems, but its processing time increases concomitantly with the network scale. For conventional data parallelism, workers must wait for data communication to and from a server during weight updating. Using the proposed data-parallel method, the network weight has a delay. It is therefore stale. Nevertheless, it gives faster convergence time by hiding the latency of the weight communication for the server. The server concurrently carries out the weight communication and weight update while workers calculate their gradients. The experimentally obtained results demonstrate that, in the proposed data parallel method, the final accuracy converges within degradation of 1.5% compared with the conventional method in both VGG and ResNet At maximum, the convergence speedup factor theoretically reaches double that of conventional data parallelism.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations