Addressing Extrapolation Error in Deep Offline Reinforcement Learning

Caglar Gulcehre,Sergio Gomez Colmenarejo,Ziyu Wang,Jakub Sygnowski,Thomas Paine,Konrad Zolna,Yutian Chen,Matthew W. Hoffman,Razvan Pascanu,Nando de Freitas

Addressing Extrapolation Error in Deep Offline Reinforcement Learning

2021

Reinforcement learning (RL) encompasses both online and offline regimes. Unlike its online counterpart, offline RL agents are trained using logged-data only, without interaction with the environment. Therefore, offline RL is a promising direction for real-world applications, such as healthcare, where repeated interaction with environments is prohibitive. However, since offline RL losses often involve evaluating state-action pairs not well-covered by training data, they can suffer due to the errors introduced when the function approximator attempts to extrapolate those pairs' value. These errors can be compounded by bootstrapping when the function approximator overestimates, leading the value function to *grow unbounded*, thereby crippling learning. In this paper, we introduce a three-part solution to combat extrapolation errors: (i) behavior value estimation, (ii) ranking regularization, and (iii) reparametrization of the value function. We provide ample empirical evidence on the effectiveness of our method, showing state of the art performance on the RL Unplugged (RLU) ATARI dataset. Furthermore, we introduce new datasets for bsuite as well as partially observable DeepMind Lab environments, on which our method outperforms state of the art offline RL algorithms.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations