A performance gradient perspective on gradient‐based policy iteration and a modified value iteration
2008
Purpose – The purpose of this paper is to develop a mathematical framework to address some algorithmic features of approximate dynamic programming (ADP) by using an average cost formulation based on the concepts of differential costs and performance gradients. Under such a framework, a modified value iteration algorithm is developed that is easy to implement, in the mean time it can address a class of partially observable Markov decision processes (POMDP).Design/methodology/approach – Gradient‐based policy iteration (GBPI) is a top‐down, system‐theoretic approach to dynamic optimization with performance guarantees. In this paper, a bottom‐up, algorithmic view is provided to complement the original high‐level development of GBPI. A modified value iteration is introduced, which can provide solutions to the same type of POMDP problems dealt with by GBPI. Numerical simulations are conducted to include a queuing problem and a maze problem to illustrate and verify features of the proposed algorithms as compared...
Keywords:
- Correction
- Source
- Cite
- Save
- Machine Reading By IdeaReader
17
References
3
Citations
NaN
KQI