Plenary lecture 8: neuro-control of vibration for a maglev vehicle traveling at high speeds
2009
The standard approach to control and optimization of stochastic systems is based on dynamic programming. This approach works backward in time and it treats the infinite-horizon problems as the limiting case with the backward time going to infinity. Optimality equations are first derived and then it is proved that the solutions to the optimality equations indeed lead to optimal policies. When the value functions are not differentiable, the concept of viscosity solutions is introduced.
A sensitivity-based approach has been developed recently to stochastic learning and optimization. The approach was first developed for discrete event dynamic systems and is being extended to continuous-time and continuous state systems. The basic idea is: fundamentally, one can only compare the performance of two policies at a time; and therefore, when developing optimization theories and methodologies, one has to first study the different of the performance of any two polices. It has been observed that for many stochastic optimization problems the information about the difference of the performance of any two policies can be decomposed into two factors, each of them is associated with one of the policies only. This feature allows us to find a better policy than the policy we are evaluating without analyzing any other policies. We found that this "information decomposition" is essential for optimization.
This "information decomposition" approach has some advantages over the dynamic programming approach: It is intuitive clear because it is based on a direct comparison of any two policies. Thus, it is easy to verify that the solution to the optimality equation is indeed optimal; viscosity solution is not needed. This approach applies in the same way to different performance criteria, including finite and infinite-horizon problems. Furthermore, the approach brings some new insights that leads to new methods and results in control and optimization, including the event-based optimization and gradient-based learning.
Keywords:
- Correction
- Source
- Cite
- Save
- Machine Reading By IdeaReader
0
References
0
Citations
NaN
KQI