An Empirical Relative Value Learning Algorithm for Non-parametric MDPs with Continuous State Space

Hiteshi Sharma,Rahul Jain,Abhishek Gupta

An Empirical Relative Value Learning Algorithm for Non-parametric MDPs with Continuous State Space

2019

Hiteshi Sharma
Rahul Jain
Abhishek Gupta

We propose an empirical relative value learning (ERVL) algorithm for non-parametric MDPs with continuous state space and finite actions and average reward criterion. The ERVL algorithm relies on function approximation via nearest neighbors, and minibatch samples for value function update. It is universal (will work for any MDP), computationally quite simple and yet provides arbitrarily good approximation with high probability in finite time. This is the first such algorithm for non-parametric (and continuous state space) MDPs with average reward criteria with these provable properties as far as we know. Numerical evaluation on a benchmark problem of optimal replacement suggests good performance.

Keywords:

State space
Nonparametric statistics
Algorithm
Relative value
Function approximation
Bellman equation
Mathematics
finite time
high probability

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations