An Approximately Optimal Relative Value Learning Algorithm for Averaged MDPs with Continuous States and Actions
2019
It has long been a challenging problem to design algorithms for Markov decision processes (MDPs) with continuous states and actions that are provably approximately optimal and can provide arbitrarily good approximation for any MDP. In this paper, we propose an empirical value learning algorithm for average MDPs with continuous states and actions that combines empirical value iteration with n function-parametric approximation and approximation of transition probability distribution with kernel density estimation. We view each iteration as operation of random operator and argue convergence using the probabilistic contraction analysis method that the authors (along with others) have recently developed.
Keywords:
- Correction
- Source
- Cite
- Save
- Machine Reading By IdeaReader
14
References
4
Citations
NaN
KQI