AdamGleave comments on AI Safety in a World of Vulnerable Machine Learning Systems

AdamGleave 9 Mar 2023 1:47 UTC
LW: 1 AF: 1
0
AF
Thanks, I’d missed that!
Curious if you have any high-level takeaways from that? Bigger models do better, clearly, but e.g. how low do you think we’ll be able to get the error rate in the next 5-10 years given expected compute growth? Are there any follow-up experiments you’d like to see happen in this space?
Also could you clarify whether the setting was for adversarial training or just a vanilla model? “During training, adversarial examples for training are constructed by PGD attacker of 30 iterations” makes me think it’s adversarial training but I could imagine this just being used for evals.
- Ethan Caballero 9 Mar 2023 9:40 UTC
  LW: 1 AF: 1
  0
  AF Parent
  The setting was adversarial training and adversarial evaluation. During training, PGD attacker of 30 iterations is used to construct adversarial examples used for training. During testing, the evaluation test set is an adversarial test set that is constructed via PGD attacker of 20 iterations.
  
  Experimental data of y-axis is obtained from Table 7 of https://arxiv.org/abs/1906.03787; experimental data of x-axis is obtained from Figure 7 of https://arxiv.org/abs/1906.03787.