Reward-Weighted Regression Converges to a Global Optimum.

Miroslav Strupl,Francesco Faccio,Dylan R. Ashley,Rupesh Kumar Srivastava,Jürgen Schmidhuber

Reward-Weighted Regression Converges to a Global Optimum.

2021

Miroslav Strupl
Francesco Faccio
Dylan R. Ashley
Rupesh Kumar Srivastava
Jürgen Schmidhuber

Reward-Weighted Regression (RWR) belongs to a family of widely known iterative Reinforcement Learning algorithms based on the Expectation-Maximization framework. In this family, learning at each iteration consists of sampling a batch of trajectories using the current policy and fitting a new policy to maximize a return-weighted log-likelihood of actions. Although RWR is known to yield monotonic improvement of the policy under certain circumstances, whether and under which conditions RWR converges to the optimal policy have remained open questions. In this paper, we provide for the first time a proof that RWR converges to a global optimum when no function approximation is used.

Keywords:

Reinforcement learning
Regression
Function approximation
Mathematics
Global optimum
Monotonic function
Unit-weighted regression
Sampling (statistics)
Mathematical optimization
Current (mathematics)

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations