Safely Bridging Offline and Online Reinforcement Learning.

Wanqiao Xu,Kan Xu,Hamsa Bastani,Osbert Bastani

Safely Bridging Offline and Online Reinforcement Learning.

2021

Wanqiao Xu
Kan Xu
Hamsa Bastani
Osbert Bastani

A key challenge to deploying reinforcement learning in practice is exploring safely. We propose a natural safety property -- \textit{uniformly} outperforming a conservative policy (adaptively estimated from all data observed thus far), up to a per-episode exploration budget. We then design an algorithm that uses a UCB reinforcement learning policy for exploration, but overrides it as needed to ensure safety with high probability. We experimentally validate our results on a sepsis treatment task, demonstrating that our algorithm can learn while ensuring good performance compared to the baseline policy for every patient.

Keywords:

task
Machine learning
Safety property
Computer science
Key (cryptography)
high probability
Baseline (configuration management)
Reinforcement learning
Artificial intelligence
Bridging (networking)

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations