On the Convergence of Block Coordinate Descent in Training DNNs with Tikhonov Regularization

Ziming Zhang,Matthew Brand

On the Convergence of Block Coordinate Descent in Training DNNs with Tikhonov Regularization

2017

Ziming Zhang
Matthew Brand

By lifting the ReLU function into a higher dimensional space, we develop a smooth multi-convex formulation for training feed-forward deep neural networks (DNNs). This allows us to develop a block coordinate descent (BCD) training algorithm consisting of a sequence of numerically well-behaved convex optimizations. Using ideas from proximal point methods in convex analysis, we prove that this BCD algorithm will converge globally to a stationary point with R-linear convergence rate of order one. In experiments with the MNIST database, DNNs trained with this BCD algorithm consistently yielded better test-set error rates than identical DNN architectures tarined via all the stochastic gradient descent (SGD) variants in the Caffe toolbox.

Keywords:

Machine learning
Rate of convergence
Tikhonov regularization
Mathematical optimization
Artificial intelligence
Coordinate descent
Artificial neural network
Stochastic gradient descent
Convex analysis
MNIST database
Mathematics
Stationary point
Convergence (routing)
Computer science
Algorithm

Correction
Cite
Save
Machine Reading By IdeaReader

References

Citations