Reinforcement Learning Algorithms in Markov Decision Processes AAAI-10 Tutorial Part IV: Take home message

Csaba Szepesvári,Richard S. Sutton,Doina Precup,Cosmin Paduraru,Anna Koop,Satinder P. Singh

Reinforcement Learning Algorithms in Markov Decision Processes AAAI-10 Tutorial Part IV: Take home message

2010

Csaba Szepesvári
Richard S. Sutton
Doina Precup
Cosmin Paduraru
Anna Koop
Satinder P. Singh

• Uses importance sampling to convert off-policy case to on-policy case • Convergence assured by theorem of Tsitsiklis & Van Roy (1997) • Survives the Bermuda triangle! BUT! • Variance can be high, even infinite (slow learning) • Difficult to use with continuous or large action spaces • Requires explicit representation of behavior policy (probability distribution) Option formalism An option is defined as a triple o = 〈I,π,β〉 • I ⊆ S is the set of states in which the option can be initiated • π is the internal policy of the option • β : S → [0, 1] is a stochastic termination condition We want to compute the reward model of option o: Eo{R(s)} = E{r1 + r2 + . . . + rT |s0 = s,π,β}

Keywords:

Formalism (philosophy)
Probability distribution
Monad (category theory)
Importance sampling
Markov decision process
Learning classifier system
Machine learning
Reinforcement learning
Artificial intelligence
Partially observable Markov decision process
Computer science
Convergence (routing)
Discrete mathematics

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations