A performance gradient perspective on gradient‐based policy iteration and a modified value iteration

Lei Yang,James Dankert,Jennie Si

A performance gradient perspective on gradient‐based policy iteration and a modified value iteration

2008

Purpose – The purpose of this paper is to develop a mathematical framework to address some algorithmic features of approximate dynamic programming (ADP) by using an average cost formulation based on the concepts of differential costs and performance gradients. Under such a framework, a modified value iteration algorithm is developed that is easy to implement, in the mean time it can address a class of partially observable Markov decision processes (POMDP).Design/methodology/approach – Gradient‐based policy iteration (GBPI) is a top‐down, system‐theoretic approach to dynamic optimization with performance guarantees. In this paper, a bottom‐up, algorithmic view is provided to complement the original high‐level development of GBPI. A modified value iteration is introduced, which can provide solutions to the same type of POMDP problems dealt with by GBPI. Numerical simulations are conducted to include a queuing problem and a maze problem to illustrate and verify features of the proposed algorithms as compared...

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations