Model-free Reinforcement Learning for Branching Markov Decision Processes

Ernst Moritz Hahn,Mateo Perez,Sven Schewe,Fabio Somenzi,Ashutosh Trivedi,Dominik Wojtczak

Model-free Reinforcement Learning for Branching Markov Decision Processes

2021

Ernst Moritz Hahn
Mateo Perez
Sven Schewe
Fabio Somenzi
Ashutosh Trivedi
Dominik Wojtczak

We study reinforcement learning for the optimal control of Branching Markov Decision Processes (BMDPs), a natural extension of (multitype) Branching Markov Chains (BMCs). The state of a (discrete-time) BMCs is a collection of entities of various types that, while spawning other entities, generate a payoff. In comparison with BMCs, where the evolution of a each entity of the same type follows the same probabilistic pattern, BMDPs allow an external controller to pick from a range of options. This permits us to study the best/worst behaviour of the system. We generalise model-free reinforcement learning techniques to compute an optimal control strategy of an unknown BMDP in the limit. We present results of an implementation that demonstrate the practicality of the approach.

Keywords:

Stochastic game
Computer science
Markov chain
Limit (mathematics)
Range (mathematics)
Optimal control
Mathematical optimization
Reinforcement learning
Probabilistic logic
Markov decision process

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations