Offline RL With Resource Constrained Online Deployment
2021
Offline reinforcement learning is used to train policies in scenarios where
real-time access to the environment is expensive or impossible. As a natural
consequence of these harsh conditions, an agent may lack the resources to fully
observe the online environment before taking an action. We dub this situation
the resource-constrained setting. This leads to situations where the offline
dataset (available for training) can contain fully processed features (using
powerful language models, image models, complex sensors, etc.) which are not
available when actions are actually taken online. This disconnect leads to an
interesting and unexplored problem in offline RL: Is it possible to use a
richly processed offline dataset to train a policy which has access to fewer
features in the online environment? In this work, we introduce and formalize
this novel resource-constrained problem setting. We highlight the performance
gap between policies trained using the full offline dataset and policies
trained using limited features. We address this performance gap with a policy
transfer algorithm which first trains a teacher agent using the offline dataset
where features are fully available, and then transfers this knowledge to a
student agent that only uses the resource-constrained features. To better
capture the challenge of this setting, we propose a data collection procedure:
Resource Constrained-Datasets for RL (RC-D4RL). We evaluate our transfer
algorithm on RC-D4RL and the popular D4RL benchmarks and observe consistent
improvement over the baseline (TD3+BC without transfer). The code for the
experiments is available at
https://github.com/JayanthRR/RC-OfflineRL}{github.com/RC-OfflineRL.
Keywords:
- Correction
- Source
- Cite
- Save
- Machine Reading By IdeaReader
50
References
0
Citations
NaN
KQI