The large learning rate phase of deep learning

Aitor Lewkowycz,Yasaman Bahri,Ethan Dyer,Jascha Sohl-Dickstein,Guy Gur-Ari

The large learning rate phase of deep learning

2020

The choice of initial learning rate can have a profound effect on the performance of deep networks. We present empirical evidence that networks exhibit sharply distinct behaviors at small and large learning rates. In the small learning rate phase, training can be understood using the existing theory of infinitely wide neural networks. At large learning rates, we find that networks exhibit qualitatively distinct phenomena that cannot be explained by existing theory: The loss grows during the early part of training, and optimization eventually converges to a flatter minimum. Furthermore, we find that the optimal performance is often found in the large learning rate phase. To better understand this behavior we analyze the dynamics of a two-layer linear network and prove that it exhibits these different phases. We find good agreement between our analysis and the training dynamics observed in realistic deep learning settings.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations