...In multiagent settings, adversarial policies can be developed by training an adversarial agent to minimize a victim agent’s rewards. Prior work has studied black-box attacks where the adversary only sees the state observations and effectively treats the victim as any other part of the environment. In this work, we experiment with white-box adversarial policies to study whether an agent’s internal state can offer useful information for other agents. We make three contributions. First, we introduce white-box adversarial policies in which an attacker can observe a victim’s internal state at each timestep. Second, we demonstrate that white-box access to a victim makes for better attacks in two-agent environments, resulting in both faster initial learning and higher asymptotic performance against the victim. Third, we show that training against white-box adversarial policies can be used to make learners in single-agent environments more robust to domain shifts.
another new paper that could imaginably be worth boosting: “White-Box Adversarial Policies in Deep Reinforcement Learning”
https://arxiv.org/abs/2209.02167
https://github.com/thestephencasper/white_box_rarl https://twitter.com/StephenLCasper/status/1567696211293110273