Transferable Environment Poisoning: Training-time Attack on Reinforcement Learning
2021
Studying adversarial attacks on Reinforcement Learning (RL) agents has become a key aspect of developing robust, RL-based solutions. Test-time attacks, which target the post-learning performance of an RL agent's policy, have been well studied in both white- and black-box settings. More recently, however, state-of-the-art works have shifted to investigate training-time attacks on RL agents, i.e., forcing the learning process towards a target policy designed by the attacker. Alas, these SOTA works continue to rely on white-box settings and/or use a reward-poisoning approach. In contrast, this paper studies environment-dynamics poisoning attacks at training time. Furthermore, while environment-dynamics poisoning presumes a transfer-learning capable agent, it also allows us to expand our approach to black-box attacks. Our overall framework, inspired by hierarchical RL, seeks the minimal environment-dynamics manipulation that will prompt the momentary policy of the agent to change in a desired manner. We show the attack efficiency by comparing it with the reward-poisoning approach, and empirically demonstrate the transferability of the environment-poisoning attack strategy. Finally, we seek to exploit the transferability of the attack strategy to handle black-box settings.
- Correction
- Source
- Cite
- Save
- Machine Reading By IdeaReader
23
References
0
Citations
NaN
KQI