Over on the Discord server where the creator of KataGo, and a bunch of other computer go people, hang out, there’s what might be an interesting development. KataGo’s creator says he’s tried to reproduce the results and failed—the adversary does indeed provoke misbehaviour from KataGo’s policy network alone (which no one should be worried by; the job of the policy network is to propose moves for search, not to play well on its own, though it turns out it does happen to play quite well by the standards of puny humans) but even a teeny-tiny amount of search makes the losing-passing stop completely.
I’m able to replicate the raw policy being “vulnerable” and I observe that the adversarial positions tend to raise the probability of the raw policy on pass, but I’m unable to replicate KataGo wanting to pass with even a tiny amount of search in the positions where the SGFs published has it passing.
If I understand correctly, the version of KG used by the paper’s authors is (1) not the latest and (2) modified in order to support their research. So it seems possible that (1) KG’s behaviour has changed somehow or (2) the researchers’ modifications actually broke something. Or that (3) there’s just a bug in KG’s play-a-match-against-yourself code, not introduced by the researchers, that makes the attack succeed in that context even when KG gets to do some searching. My impression is that #3 is viewed as quite plausible by KG’s creator.
Over on the Discord server where the creator of KataGo, and a bunch of other computer go people, hang out, there’s what might be an interesting development. KataGo’s creator says he’s tried to reproduce the results and failed—the adversary does indeed provoke misbehaviour from KataGo’s policy network alone (which no one should be worried by; the job of the policy network is to propose moves for search, not to play well on its own, though it turns out it does happen to play quite well by the standards of puny humans) but even a teeny-tiny amount of search makes the losing-passing stop completely.
If I understand correctly, the version of KG used by the paper’s authors is (1) not the latest and (2) modified in order to support their research. So it seems possible that (1) KG’s behaviour has changed somehow or (2) the researchers’ modifications actually broke something. Or that (3) there’s just a bug in KG’s play-a-match-against-yourself code, not introduced by the researchers, that makes the attack succeed in that context even when KG gets to do some searching. My impression is that #3 is viewed as quite plausible by KG’s creator.