Entropy-SGD: Biasing gradient descent into wide valleys. International Conference on Learning Representations (ICLR) 2017

References