Approximate Modied Policy Iteration

Bruno Scherrer,Mohammad Ghavamzadeh,Victor Gabillon,Inria Lille,Matthieu Geist

Approximate Modied Policy Iteration

2012

Bruno Scherrer
Mohammad Ghavamzadeh
Victor Gabillon
Inria Lille
Matthieu Geist

Modied policy iteration (MPI) is a dynamic programming (DP) algorithm that contains the two celebrated policy and value iteration methods. Despite its generality, MPI has not been thoroughly studied, especially its approximation form which is used when the state and/or action spaces are large or innite. In this paper, we propose three implementations of approximate MPI (AMPI) that are extensions of well-known approximate DP algorithms: tted-value iteration, ttedQ iteration, and classication-bas ed policy iteration. We provide error propagation analysis that unies those for approximate policy and value iteration. For the classicationbased implementation, we develop a nitesample analysis that shows that MPI’s main parameter allows to control the balance between the estimation error of the classier and the overall value function approximation.

Keywords:

Propagation of uncertainty
Power iteration
Bellman equation
Fixed-point iteration
Mathematical optimization
Markov decision process
Dynamic programming
Generality
Implementation
Computer science
Applied mathematics

Correction
Cite
Save
Machine Reading By IdeaReader

References

Citations