Transferable Environment Poisoning: Training-time Attack on Reinforcement Learning

2021 
Studying adversarial attacks on Reinforcement Learning (RL) agents has become a key aspect of developing robust, RL-based solutions. Test-time attacks, which target the post-learning performance of an RL agent's policy, have been well studied in both white- and black-box settings. More recently, however, state-of-the-art works have shifted to investigate training-time attacks on RL agents, i.e., forcing the learning process towards a target policy designed by the attacker. Alas, these SOTA works continue to rely on white-box settings and/or use a reward-poisoning approach. In contrast, this paper studies environment-dynamics poisoning attacks at training time. Furthermore, while environment-dynamics poisoning presumes a transfer-learning capable agent, it also allows us to expand our approach to black-box attacks. Our overall framework, inspired by hierarchical RL, seeks the minimal environment-dynamics manipulation that will prompt the momentary policy of the agent to change in a desired manner. We show the attack efficiency by comparing it with the reward-poisoning approach, and empirically demonstrate the transferability of the environment-poisoning attack strategy. Finally, we seek to exploit the transferability of the attack strategy to handle black-box settings.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    23
    References
    0
    Citations
    NaN
    KQI
    []