Chris van Merwijk comments on An AI defense-offense symmetry thesis

Chris van Merwijk 14 Jul 2022 10:43 UTC
2 points
“I think in the defense-offense case the actions available to both sides are approximately the same”
If attacker has the action “cause a 100% lethal global pandemic” and the defender has the task “prevent a 100% lethal global pandemic”, then clearly these are different problems, and it is a thesis, a thing to be argued for, that the latter requires largely the same skills/tech as the former (which is what this offense-defense symmetry thesis states).
If you build an OS that you’re trying to make safe against attacks, you might do e.g. what the seL4 microkernel team did and formally verify the OS to rule out large classes of attacks, and this is an entirely different kind of action than “find a vulnerability in the OS and develop an exploit to take control over it”.
“I wouldn’t say the strategy-stealing assumption is about a symmetric game”
Just to point out that the original strategy stealing argument assumes literal symmetry. I think the argument only works insofar as generalizing from literal symmetry doesn’t break this argument (to e.g. something more like linearity of the benefit of initial resources). I think you actually need something like symmetry in both instrumental goals, and “initial-resources-to-output map”.
The strategy-stealing argument as applied to defense-offense would say something like “whatever offense does to increase its resources / power is something that defense could also do to increase resources / power”.
Yes, but this is almost the opposite of what the offense-defense symmetry thesis is saying. Because it can both be true that 1. defender can steal attacker’s strategies, AND 2. defender alternatively has a bunch of much easier strategies available, by which it can defend against attacker and keep all the resources.
This DO-symmetry thesis says that 2 is NOT true, because all such strategies in fact also require the same kind of skills. The point of the DO-symmetry thesis is to make more explicit the argument that humans cannot defend against misaligned AI without their own aligned AI.

“This isn’t the same as your thesis.”
Ok I only read this after writing all of the above, so I thought you were implying they were the same (and was confused as to why you would imply this), and I’m guessing you actually just meant to say “these things are sort of vaguely related”.
Anyway, if I wanted to state what I think the relation is in a simple way I’d say that they give lower and upper bounds respectively on the capabilities needed from AI systems:
- OD-symmetry thesis: We need our defensive AI to be at least as capable as any misaligned AI.
- strategy-stealing: We don’t need our defensive AI to be any more capable.
I think probably both are not entirely right.
- Rohin Shah 14 Jul 2022 15:38 UTC
  1 point
  Parent
  Yes, all of that mostly sounds right to me.
  I agree the formal strategy-stealing argument relies on literal symmetry; I would say the linked post is applying it to asymmetric situations, where you can recover something roughly symmetric, by assuming that both players need to first accumulate resources and power. (I think this is basically what you said.)