Information theoretic derivation of network architecture and learning algorithms

1991 
Using variational techniques, the authors derive a feedforward network architecture that minimizes a least squares cost function with the soft constraint that the mutual information between input and output is maximized. This permits optimum generalization for a given accuracy. The architecture resembles local radial basis function networks with two important modifications: a normalization which greatly reduces the data requirements, and an extra set of gradient style weights which improves interpolation. Learning on the linear weights is by linear Kalman filtering. Performing gradient descent on the composite cost function obtains a learning algorithm for the basis function widths which adjusts the widths for good generalization. A set of learning algorithms is obtained. The network and learning algorithms are tested on a set of test problems which emphasize time series prediction. >
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    6
    References
    9
    Citations
    NaN
    KQI
    []