Safe Reinforcement Learning via Statistical Model Predictive Shielding

Osbert Bastani,Shuo Li,Anton Xue

Safe Reinforcement Learning via Statistical Model Predictive Shielding

2021

Osbert Bastani
Shuo Li
Anton Xue

Reinforcement learning is a promising approach to solving hard robotics tasks. An important challenge is ensuring safety—e.g., that a walking robot does not fall over or an autonomous car does not crash into an obstacle. We build on an approach that composes the learned policy with a backup policy—it uses the learned policy on the interior of the region where the backup policy is guaranteed to be safe, and switches to the backup policy on the boundary of this region. The key challenge is checking when the backup policy is guaranteed to be safe. Our algorithm, statistical model predictive shielding (SMPS), uses sampling-based verification and linear systems analysis to perform this check. We prove that SMPS ensures safety with high probability, and empirically evaluate its performance on several benchmarks.

Keywords:

Computer science
Key (cryptography)
Robotics
Statistical model
Reinforcement learning
Reliability engineering
Backup
Obstacle
Linear system
Artificial intelligence
Robot

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations