A Hessian Free Neural Networks Training Algorithm with Curvature Scaled Adaptive Momentum

2020 
In this paper we propose an algorithm for training neural network architectures, called Hessian Free algorithm with Curvature Scaled Adaptive Momentum (HF-CSAM). The algorithm’s weight update rule is similar to SGD with momentum but with two main differences arising from the formulation of the training task as a constrained optimization problem: (i) the momentum term is scaled with curvature information (in the form of the Hessian); (ii) the coefficients for the learning rate and the scaled momentum term are adaptively determined. The implementation of the algorithm requires minimal additional computations compared to a classical SGD with momentum iteration since no actual computation of the Hessian is needed, due to the algorithm’s requirement for computing only a Hessian-vector product. This product can be computed exactly and very efficiently within any modern computational graph framework such as, for example, Tensorflow. We report experiments with different neural network architectures trained on standard neural network benchmarks which demonstrate the efficiency of the proposed method.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    16
    References
    0
    Citations
    NaN
    KQI
    []