I see where you’re coming from, but I don’t think the exploit search they did here is fundamentally different from other kinds of computer preparation. If I were going to play chess against Magnus Carlsen I’d definitely study his games with a computer, and if that computer found a stunning refutation to an opening he liked I’d definitely play it. Should we say, then, that the computer beat Carlsen, and not me? Or leave the computer aside: if I were prepping with a friend, and my friend found the winning line, should we say my friend beat Carlsen? What if I found the line in an opening book he hadn’t read?
You win games of chess, or go, by knowing things your opponent doesn’t. Where that knowledge comes from matters to the long term health of the game, but it isn’t reflected in the match score, and I think it’s the match score that matters most here.
To my embarrassment, I have not been paying much attention to adversarial RL at all. It is clearly past time to start!
That’s fair, but I still think saying “Go has been unsolved” is importantly misleading: for one, it hasn’t been solved!
Also, the specific cycle attack doesn’t work against other engines I think? In the paper their adversary doesn’t transfer very well to LeelaZero, for example. So it’s more one particular AI having issues, than a fact about Go itself. (EDIT: As Tony clarifies below, the cyclic attack seems to be a common failure mode amongst other top Go AIs as well, so I retract this paragraph—at the very least, it seems to be a fact about ~all the top Go AIs!)
EDIT: also, I think if you got arbitrary I/O access to a Magnus simulator, and then queried it millions of times in the course of doing AlphaZero style training to derive an adversarial example, I’d say it’s pretty borderline if it’s you beating beating him. Clearly there’s some level of engine skill where it’s no longer you playing!
Also, the specific cycle attack doesn’t work against other engines I think? In the paper their adversary doesn’t transfer very well to LeelaZero, for example. So it’s more one particular AI having issues, than a fact about Go itself.
Hi, one of the authors here speaking on behalf of the team. We’re excited to see that people are interested in our latest results. Just wanted to comment a bit on transferability.
The adversary trained in our paper has a 97% winrate against KataGo at superhuman strength, a 6.1% winrate against LeelaZero at superhuman strength, and a 3.5% winrate against ELF OpenGo at superhuman strength. Moreover, in the games that we do win, we win by carrying out the cyclic-exploit (see https://goattack.far.ai/transfer), which shows that LZ and ELF are definitely susceptible. In fact, Kellin was also able to beat LZ with 100k visits using the cyclic exploit.
And while it is true that our adversary has a significantly reduced winrate against LZ/ELF compared to KataGo, even a 3.5% winrate clearly demonstrates the existence of a flaw.[1] For example looking at goratings.org, a 3.5% win rate against the world #1 (3828 elo) is approximately 3245 elo, which is still in top 250 in the world. Considering that LZ/ELF are stronger than any human, the winrate we get against them should easily correspond to a top professional level of play, if not a superhuman level. But our adversary loses to a weak amateur (myself).
We haven’t confirmed this ourselves yet, but Golaxy and FineArt (two strong Chinese Go AIs) also seem to systematically misevaluate positions with cyclic groups. Our evidence is this bilbili video, which shows off various cyclic positions that KataGo, Golaxy, and FineArt all misevaluate. Golaxy and FineArt (绝艺) are shown at the end of the video.[2]
Now these are only misevaluated test-positions, which don’t necessarily imply that a from-scratch cyclic-exploit is possible to pull off. But given that a) LZ/ELF are both vulnerable; b) LZ was manually exploited by Kellin; and c) Golaxy and FineArt misevaluate cyclic-groups in the same way as KataGo; this does not bode well for Golaxy and FineArt. We are currently in the process of getting access to FineArt to test its robustness ourselves.
To our knowledge, this attack is the first exploit that consistently wins against top programs using substantial search, without repeating specific sequences (e.g., finding a particular game that a bot lost and replaying the key parts of it). Our adversary algorithm also learned from scratch, without using any existing knowledge. However, there are other known weaknesses of bots, such as a fairly specific, complex sequence called “Mi Yuting’s Flying Dagger joseki”, or the ladder tactic. While these weaknesses were previously widespread, targeted countermeasures for them have already been created, so they cannot be used to consistently win games against top programs like KataGo. Nonetheless, these weaknesses, along with the cyclic one our adversary targets, suggest that CNN-based MCTS Go AIs have a shared set of flaws. Perhaps similar learning algorithms / neural-net architectures learn similar circuits / heuristics and thus also share the same vulnerabilities?
One question that we have been thinking about is whether the cyclic-vulnerability lies with CNNs or with AlphaZero style training. For example, some folks in multiagent systems think that “the failure of naive self play to produce unexploitable policies is textbook level material”. On the other hand, David Wu’s tree vs. cycle theory seems to suggest that certain inductive biases of CNNs are also at play.
Our adversary was also run in a weird mode against LZ/ELF, because it modeled LZ/ELF as being KataGo. We ran our transfer evaluation this way because accurately modeling LZ/ELF would have required a lot of additional software engineering. It’s not entirely clear to me that accurate modeling would necessarily help though.
Thanks for the clarification, especially how a 6.1% winrate vs LeelaZero and 3.5% winrate vs ELF still imply significantly stronger Elo than is warranted.
The fact that Kellin could defeat LZ manually as well as the positions in bilibili video do seem to suggest that this is a common weakness of many AlphaZero-style Go AIs. I retract my comment about other engines.
To our knowledge, this attack is the first exploit that consistently wins against top programs using substantial search, without repeating specific sequences (e.g., finding a particular game that a bot lost and replaying the key parts of it).
Yeah! I’m not downplaying the value of this achievement at all! It’s very cool that this attack works and can be reproduced by a human. I think this work is great (as I’ve said, for example, in my comments on the ICML paper). I’m specifically quibbling about the “solved/unsolved” terminology that the post used to use.
Perhaps similar learning algorithms / neural-net architectures learn similar circuits / heuristics and thus also share the same vulnerabilities?
Your comment reminded me of ~all the adversarial attack transfer work in the image domain, which does suggest that non-adversarially trained neural networks will tend to have the same failure modes. Whoops. Should’ve thought about those results (and the convergent learning/universality results from interp) before I posted.
Keep in mind that the adversary was specifically trained against KataGo, whereas the performance against LeelaZero and ELF is basically zero-shot. It’s likely the case that an adversary trained against LeelaZero and ELF would also win consistently.
I’ve run LeelaZero and ELF and MiniGo (yet another independent AlphaZero replication in Go) by hand in particular test positions to see what their policy and value predictions are, and they all very massively misevaluate cyclic group situations just like KataGo. Perhaps by pure happenstance different bots could “accidentally” prefer different move patterns that make it harder or easier to form the attack patterns (indeed this is almost certainly something that should vary between bots as they do have different styles and preferences to some degree), but probably the bigger contributor to the difference is explicit optimization vs zero shot.
(As for whether, e.g. a transformer architecture would have less of an issue—I genuinely have no idea, I think it could go either way, nobody I know has tried it in Go. I think it’s at least easier to see why a convnet could be susceptible to this specific failure mode, but that doesn’t mean other architectures wouldn’t be too)
One question that we have been thinking about is whether the cyclic-vulnerability lies with CNNs or with AlphaZero style training. For example, some folks in multiagent systems think that “the failure of naive self play to produce unexploitable policies is textbook level material”. On the other hand, David Wu’s tree vs. cycle theory seems to suggest that certain inductive biases of CNNs are also at play.
“Why not both?” Twitter snideness aside*, I don’t see any contradiction: cycling in multi-agent scenarios due to forgetting responses is consistent with bad inductive biases. The biases make it unable to easily learn the best response, and so it learns various inferior responses which form a cycle.
Imagine that CNNs cannot ‘see’ the circles because the receptive window grows too slowly or some CNN artifact like that; no amount of training can let it see circles in full generality and recognize the trap. But it can still learn to win: eg. with enough adversarial training against an exploiter which has learned to create circles in the top left, it learns a policy of being scared of circles in the top left, and stops losing by learning to create circles in the other corners (where, as it happens, it is not currently being exploited); then the exploiter resumes training and learns to create circles in the top right, where the CNN falls right into the trap, and so it returns to winning; then the adversarial training resumes and it forgets the avoid-top-left strategy and learns the avoid-top-right strategy… And so on forever. The CNN cannot learn a policy of ‘never create circles in any corner’ because you can’t win a game of Go like that, and CNN/exploiter just circle around the 4 corners playing rock-paper-scissors-spock eternally.
* adversarial spheres looks irrelevant to me, and the other paper is relevant but attacks a fixed policy which is not the case with MCTS, especially with extremely large search budgets—which is supposed to be complete in the limit and is also changing the policy at runtime by policy-improvement
Also, the specific cycle attack doesn’t work against other engines I think? In the paper their adversary doesn’t transfer very well to LeelaZero, for example. So it’s more one particular AI having issues, than a fact about Go itself.
Sure, but refutations don’t transfer to different openings either, right? I feel like most game-winning insights are contingent in this sense, rather than being fundamental to the game.
EDIT: also, I think if you got arbitrary I/O access to a Magnus simulator, and then queried it millions of times in the course of doing AlphaZero style training to derive an adversarial example, I’d say it’s pretty borderline if it’s you beating beating him. Clearly there’s some level of engine skill where it’s no longer you playing!
This is a really interesting hypothetical, but I see it differently.
If the engine isn’t feeding me moves over the board (which I certainly agree would be cheating), then it has to give me something I can memorize and use later. But I can’t memorize a whole dense game tree full of winning lines (and the AI can’t calculate that anyway), so it has to give me something compressed (that is, abstract) that I can decompress and apply to a bunch of different board positions. If a human trainer did that we’d call those compressed things “insights”, “tactics”, or “strategies”, and I don’t think making the trainer into a mostly-superhuman computer changes anything. I had to learn all the tactics and insights, and I had to figure out how to apply them; what is chess, aside from that?
Also, I wouldn’t expect that Carlsen has any flaws that a generally weak player, whether human or AI, could exploit the way the cyclic adversary exploits KataGo. It would find flaws, and win consistently, but if the pattern in its play were comprehensible at all it would be the kind of thing that you have to be a super-GM yourself to take advantage of. Maybe your intuition is different here? In limit I’d definitely agree with you: if the adversarial AI spit out something like “Play 1. f3 2. Kf2 and he’ll have a stroke and start playing randomly”, then yeah, that can’t really be called “beating Carlsen” anymore. But the key point there, to me, isn’t the strength of the trainer so much as the simplicity of the example; I’d have the same objection no matter how easy the exploit was to find.
If I were going to play chess against Magnus Carlsen I’d definitely study his games with a computer, and if that computer found a stunning refutation to an opening he liked I’d definitely play it.
Conditionally on him continuing to play the opening, I would expect he has a refutation to that refutation, but no reason to use the counter-refutation in public games against the computer. On the other hand, he may not want to burn it on you either.
I see where you’re coming from, but I don’t think the exploit search they did here is fundamentally different from other kinds of computer preparation. If I were going to play chess against Magnus Carlsen I’d definitely study his games with a computer, and if that computer found a stunning refutation to an opening he liked I’d definitely play it. Should we say, then, that the computer beat Carlsen, and not me? Or leave the computer aside: if I were prepping with a friend, and my friend found the winning line, should we say my friend beat Carlsen? What if I found the line in an opening book he hadn’t read?
You win games of chess, or go, by knowing things your opponent doesn’t. Where that knowledge comes from matters to the long term health of the game, but it isn’t reflected in the match score, and I think it’s the match score that matters most here.
To my embarrassment, I have not been paying much attention to adversarial RL at all. It is clearly past time to start!
That’s fair, but I still think saying “Go has been unsolved” is importantly misleading: for one, it hasn’t been solved!
Also, the specific cycle attack doesn’t work against other engines I think? In the paper their adversary doesn’t transfer very well to LeelaZero, for example. So it’s more one particular AI having issues, than a fact about Go itself. (EDIT: As Tony clarifies below, the cyclic attack seems to be a common failure mode amongst other top Go AIs as well, so I retract this paragraph—at the very least, it seems to be a fact about ~all the top Go AIs!)
EDIT: also, I think if you got arbitrary I/O access to a Magnus simulator, and then queried it millions of times in the course of doing AlphaZero style training to derive an adversarial example, I’d say it’s pretty borderline if it’s you beating beating him. Clearly there’s some level of engine skill where it’s no longer you playing!
Hi, one of the authors here speaking on behalf of the team. We’re excited to see that people are interested in our latest results. Just wanted to comment a bit on transferability.
The adversary trained in our paper has a 97% winrate against KataGo at superhuman strength, a 6.1% winrate against LeelaZero at superhuman strength, and a 3.5% winrate against ELF OpenGo at superhuman strength. Moreover, in the games that we do win, we win by carrying out the cyclic-exploit (see https://goattack.far.ai/transfer), which shows that LZ and ELF are definitely susceptible. In fact, Kellin was also able to beat LZ with 100k visits using the cyclic exploit.
And while it is true that our adversary has a significantly reduced winrate against LZ/ELF compared to KataGo, even a 3.5% winrate clearly demonstrates the existence of a flaw.[1] For example looking at goratings.org, a 3.5% win rate against the world #1 (3828 elo) is approximately 3245 elo, which is still in top 250 in the world. Considering that LZ/ELF are stronger than any human, the winrate we get against them should easily correspond to a top professional level of play, if not a superhuman level. But our adversary loses to a weak amateur (myself).
We haven’t confirmed this ourselves yet, but Golaxy and FineArt (two strong Chinese Go AIs) also seem to systematically misevaluate positions with cyclic groups. Our evidence is this bilbili video, which shows off various cyclic positions that KataGo, Golaxy, and FineArt all misevaluate. Golaxy and FineArt (绝艺) are shown at the end of the video.[2]
Now these are only misevaluated test-positions, which don’t necessarily imply that a from-scratch cyclic-exploit is possible to pull off. But given that a) LZ/ELF are both vulnerable; b) LZ was manually exploited by Kellin; and c) Golaxy and FineArt misevaluate cyclic-groups in the same way as KataGo; this does not bode well for Golaxy and FineArt. We are currently in the process of getting access to FineArt to test its robustness ourselves.
To our knowledge, this attack is the first exploit that consistently wins against top programs using substantial search, without repeating specific sequences (e.g., finding a particular game that a bot lost and replaying the key parts of it). Our adversary algorithm also learned from scratch, without using any existing knowledge. However, there are other known weaknesses of bots, such as a fairly specific, complex sequence called “Mi Yuting’s Flying Dagger joseki”, or the ladder tactic. While these weaknesses were previously widespread, targeted countermeasures for them have already been created, so they cannot be used to consistently win games against top programs like KataGo. Nonetheless, these weaknesses, along with the cyclic one our adversary targets, suggest that CNN-based MCTS Go AIs have a shared set of flaws. Perhaps similar learning algorithms / neural-net architectures learn similar circuits / heuristics and thus also share the same vulnerabilities?
One question that we have been thinking about is whether the cyclic-vulnerability lies with CNNs or with AlphaZero style training. For example, some folks in multiagent systems think that “the failure of naive self play to produce unexploitable policies is textbook level material”. On the other hand, David Wu’s tree vs. cycle theory seems to suggest that certain inductive biases of CNNs are also at play.
Our adversary was also run in a weird mode against LZ/ELF, because it modeled LZ/ELF as being KataGo. We ran our transfer evaluation this way because accurately modeling LZ/ELF would have required a lot of additional software engineering. It’s not entirely clear to me that accurate modeling would necessarily help though.
The same bilbili poster also appears to have replicated our manual cyclic-exploit against various versions of KataGo with a 9-stone handicap: https://space.bilibili.com/33337424/channel/seriesdetail?sid=2973285.
Thanks for the clarification, especially how a 6.1% winrate vs LeelaZero and 3.5% winrate vs ELF still imply significantly stronger Elo than is warranted.
The fact that Kellin could defeat LZ manually as well as the positions in bilibili video do seem to suggest that this is a common weakness of many AlphaZero-style Go AIs. I retract my comment about other engines.
Yeah! I’m not downplaying the value of this achievement at all! It’s very cool that this attack works and can be reproduced by a human. I think this work is great (as I’ve said, for example, in my comments on the ICML paper). I’m specifically quibbling about the “solved/unsolved” terminology that the post used to use.
Your comment reminded me of ~all the adversarial attack transfer work in the image domain, which does suggest that non-adversarially trained neural networks will tend to have the same failure modes. Whoops. Should’ve thought about those results (and the convergent learning/universality results from interp) before I posted.
Keep in mind that the adversary was specifically trained against KataGo, whereas the performance against LeelaZero and ELF is basically zero-shot. It’s likely the case that an adversary trained against LeelaZero and ELF would also win consistently.
I’ve run LeelaZero and ELF and MiniGo (yet another independent AlphaZero replication in Go) by hand in particular test positions to see what their policy and value predictions are, and they all very massively misevaluate cyclic group situations just like KataGo. Perhaps by pure happenstance different bots could “accidentally” prefer different move patterns that make it harder or easier to form the attack patterns (indeed this is almost certainly something that should vary between bots as they do have different styles and preferences to some degree), but probably the bigger contributor to the difference is explicit optimization vs zero shot.
So all signs point to this misgeneralization being general to AlphaZero with convnets, not one particular bot. In another post here https://www.lesswrong.com/posts/Es6cinTyuTq3YAcoK/there-are-probably-no-superhuman-go-ais-strong-human-players?commentId=gAEovdd5iGsfZ48H3 I explain why I think it’s intuitive how and why a convnet would learn the an incorrect algorithm first and then get stuck on it given the data.
(As for whether, e.g. a transformer architecture would have less of an issue—I genuinely have no idea, I think it could go either way, nobody I know has tried it in Go. I think it’s at least easier to see why a convnet could be susceptible to this specific failure mode, but that doesn’t mean other architectures wouldn’t be too)
“Why not both?” Twitter snideness aside*, I don’t see any contradiction: cycling in multi-agent scenarios due to forgetting responses is consistent with bad inductive biases. The biases make it unable to easily learn the best response, and so it learns various inferior responses which form a cycle.
Imagine that CNNs cannot ‘see’ the circles because the receptive window grows too slowly or some CNN artifact like that; no amount of training can let it see circles in full generality and recognize the trap. But it can still learn to win: eg. with enough adversarial training against an exploiter which has learned to create circles in the top left, it learns a policy of being scared of circles in the top left, and stops losing by learning to create circles in the other corners (where, as it happens, it is not currently being exploited); then the exploiter resumes training and learns to create circles in the top right, where the CNN falls right into the trap, and so it returns to winning; then the adversarial training resumes and it forgets the avoid-top-left strategy and learns the avoid-top-right strategy… And so on forever. The CNN cannot learn a policy of ‘never create circles in any corner’ because you can’t win a game of Go like that, and CNN/exploiter just circle around the 4 corners playing rock-paper-scissors-spock eternally.
* adversarial spheres looks irrelevant to me, and the other paper is relevant but attacks a fixed policy which is not the case with MCTS, especially with extremely large search budgets—which is supposed to be complete in the limit and is also changing the policy at runtime by policy-improvement
Sure, but refutations don’t transfer to different openings either, right? I feel like most game-winning insights are contingent in this sense, rather than being fundamental to the game.
This is a really interesting hypothetical, but I see it differently.
If the engine isn’t feeding me moves over the board (which I certainly agree would be cheating), then it has to give me something I can memorize and use later. But I can’t memorize a whole dense game tree full of winning lines (and the AI can’t calculate that anyway), so it has to give me something compressed (that is, abstract) that I can decompress and apply to a bunch of different board positions. If a human trainer did that we’d call those compressed things “insights”, “tactics”, or “strategies”, and I don’t think making the trainer into a mostly-superhuman computer changes anything. I had to learn all the tactics and insights, and I had to figure out how to apply them; what is chess, aside from that?
Also, I wouldn’t expect that Carlsen has any flaws that a generally weak player, whether human or AI, could exploit the way the cyclic adversary exploits KataGo. It would find flaws, and win consistently, but if the pattern in its play were comprehensible at all it would be the kind of thing that you have to be a super-GM yourself to take advantage of. Maybe your intuition is different here? In limit I’d definitely agree with you: if the adversarial AI spit out something like “Play 1. f3 2. Kf2 and he’ll have a stroke and start playing randomly”, then yeah, that can’t really be called “beating Carlsen” anymore. But the key point there, to me, isn’t the strength of the trainer so much as the simplicity of the example; I’d have the same objection no matter how easy the exploit was to find.
Curious what you make of this.
Conditionally on him continuing to play the opening, I would expect he has a refutation to that refutation, but no reason to use the counter-refutation in public games against the computer. On the other hand, he may not want to burn it on you either.