polytope comments on Beyond the Board: Exploring AI Robustness Through Go

polytope 20 Jun 2024 20:13 UTC
21 points
4
(KataGo dev here, I also provided a bit of feedback with the authors on an earlier draft.)
@gwern—The “atari” attack is still a cyclic group attack, and the ViT attack is also still a cyclic group attack. I suspect it’s not so meaningful to put much weight on the particular variant that one specific adversary happens to converge to.
This is because the space of “kinds” of different cyclic group fighting situations is combinatorically large and it’s sort of arbitrary what local minimum the adversary ends it because it doesn’t have much pressure to find more once it finds one that works. Even among just the things that are easily put into words without needing a diagram—how big is the cycle? Does the cyclic group have a big eye (>= 4 points behaves tactically distinctly) or a small eye (<=3 points), or no eye? Is the eye a two-headed-dragon-style eye, or not? Does it have more loose connections or is it solid? Is the group inside locally dead/unsettled/alive? Is the cycle group racing against an outside group for liberties or only making eyes of its own, or both? How many liberties do all the various groups each have? Are there ko-liberties? Are there approach-liberties? Is there a cycle inside the cycle? etc.
This is the same as how in Go the space of different capturing race situations in general is combinatorically large, with enough complexity that many situations are difficult even for pro players who have studied them for a lifetime.
The tricky bit here is that there seems to not be (enough) generalization between the exponentially large space of large group race situations in Go more broadly and the space of situations with cyclic groups. So whereas the situations in “normal Go” get decently explored by self-play, cyclic groups are rare in self-play so there isn’t enough data to learn them well, leaving tons of flaws, even for some cases humans consider “simple”. A reasonable mental model is that any particular adversary will probably find one or two of them somewhat arbitrarily, and then rapidly converge to exploit that, without discovering the numerous others.
The “gift” attack is distinct and very interesting. There isn’t a horizon effect involved, it’s just a straightforward 2-3 move blind spot of both the policy and value heads. Being only 2-3 moves this is why it also gets fixed more easily by search than cyclic groups. As for why it happens, as a bit of context I think these are true:
- In 99.9...% of positions, the flavor of “superko” rule doesn’t affect the value of a position or the correct move.
- The particular shape used by the gift adversary and similar shapes do occur with reasonable frequency in real games without the superko rule being relevant (due to different order of moves), in which case the “gift shape” actually is harmless rather than being a problem.
I’ve taken a look at the raw neural net outputs and it’s also clear that the neural net has no idea that the superko rule matters—predictions don’t vary as you change the superko rule in these positions. So my best guess is that that the neural net perhaps “overgeneralizes” and doesn’t easily learn that in this one specific shape with this specific order of moves, the superko rule, which almost never matters, suddenly does matter and flips the result.