A family of robust second order training algorithms
2009
Starting with the concept of equivalent networks, a framework for analyzing the effect of linear dependence on training of a multi-layer perceptron is established. Detailed mathematical analyses are carried out to show that training using backpropagation and Newton's method is different under the presence of linear dependence.
Two effective batch training algorithms are developed for the multilayer perceptron. First, the optimal input gain algorithm is presented, which computes an optimal gain coefficient for each input, used to update the input weights. The motivation for this algorithm comes from using equivalent networks to analyze the effect of input transformation. It is shown that the use of a non-orthogonal and non-singular diagonal transformation matrix is equivalent to altering the input gains in the network. Newton's method is used to simultaneously solve for the input gains and an optimal learning factor. In several examples, it is shown that the final algorithm is a reasonable compromise between first order training methods and Levenburg-Marquardt.
Second, a multiple optimal learning factor algorithm, that assigns a separate learning factor for each hidden unit is developed. The idea stems from relating a single optimal learning factor to Newton's method. It is then extended to estimate separate optimal learning factors for each hidden unit. In several examples, this method performs as well as or better than Levenberg-Marquardt.
Both methods yield a smaller Hessian compared to Newton's method for updating input weights. The Hessian matrix thus computed is less susceptible to linear dependence and displays fast convergence. It is shown that the elements of the Hessian matrix for both methods are formed by some weighted combinations of the elements from the total network's Hessian.
When used with backpropagation-type learning, the two proposed methods are limited by the presence of dependent inputs. However, when used with hidden weight optimization technique, it is shown that both methods over come the presence of dependent inputs and completely ignore them during training. This improvement results in two highly robust second order learning algorithms, which are less heuristic, less susceptible to ill-conditioned Hessian, immune to linear dependencies, faster than LM and superior to standard first order training methods.
In the last part, a new approach for modeling simple discontinuous functions is developed. This two-stage approach, trains separate networks, one for a continuous function and another for discrete step function, in the first stage and fuses the two trained networks in the second stage to obtain the final network capable of modeling the discontinuous function. Results of using our proposed second order methods to train and fuse networks to model simple discontinuous sine and ramp functions are presented.
Keywords:
- Correction
- Source
- Cite
- Save
- Machine Reading By IdeaReader
0
References
1
Citations
NaN
KQI