Online-Codistillation Meets LARS, Going beyond the Limit of Data Parallelism in Deep Learning

2020 
Data parallel training is a powerful family of methods for the efficient training of deep neural networks on big data. Unfortunately, however, recent studies have shown that the merit of increased batch size in terms of both speed and model-performance diminishes rapidly beyond some point. This seem to apply to even LARS, the state-of-the-art large batch stochastic optimization method. In this paper, we combine LARS with online-codistillation, a recently developed, efficient deep learning algorithm built on a whole different philosophy of stabilizing the training procedure using a collaborative ensemble of models. We show that the combination of large-batch training and online-codistillation is much more efficient than either one alone. We also present a novel way of implementing the online-codistillation that can further speed up the computation. We will demonstrate the efficacy of our approach on various benchmark datasets.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []