It looks like there is a new version of the attack, which wins against a version of KataGo that does not pass and that uses enough search to be handily superhuman (though much less than would typically be used in practice).
Looking at the first game here, it seems like the adversary causes KataGo to make a very serious blunder. I think this addresses the concern about winning on a technicality raised in other comments here.
It’s still theoretically unsurprising that self-play is exploitable, but I think it’s nontrivial and interesting that a neural network at this quality of play is making such severe errors. I also think that many ML researchers would be surprised by the quality of this attack. (Indeed, even after the paper came out I expect that many readers thought it would not be possible to make a convincing attack without relying on technicalities or a version of the policy with extremely minimal search.)
It looks like there is a new version of the attack, which wins against a version of KataGo that does not pass and that uses enough search to be handily superhuman (though much less than would typically be used in practice).
Looking at the first game here, it seems like the adversary causes KataGo to make a very serious blunder. I think this addresses the concern about winning on a technicality raised in other comments here.
It’s still theoretically unsurprising that self-play is exploitable, but I think it’s nontrivial and interesting that a neural network at this quality of play is making such severe errors. I also think that many ML researchers would be surprised by the quality of this attack. (Indeed, even after the paper came out I expect that many readers thought it would not be possible to make a convincing attack without relying on technicalities or a version of the policy with extremely minimal search.)