Performance Optimizations and Analysis of Distributed Deep Learning with Approximated Second-Order Optimization Method.

2019 
Faster training of deep neural networks is desired to speed up the research and development cycle in deep learning. Distributed deep learning and second-order optimization methods are two different techniques to accelerate the training of deep neural networks. In the previous work, researchers show that an approximated second-order optimization method, called K-FAC, can mitigate each other drawbacks of the two techniques. However, there was no detailed discussion on the performance, which is critical for the usage in practice. In this work, we propose several performance optimization techniques to reduce the overheads of K-FAC and to accelerate the overall training. Applying all performance optimizations, we are able to speed up the training 1.64 times per iteration compared to a baseline. Additional to the performance optimizations, we construct a simple performance model to predict model training performance to help the users to determine whether distributed K-FAC is appropriate or not for their training in terms of wall-time.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    41
    References
    7
    Citations
    NaN
    KQI
    []