Yes, I did some interpretability on the policy network of Leela Zero. Planning to post the results very soon! But I did not particularly look into the attack described here, and while there was one REMIX group that looked into a problem related to liberty counting, they didn’t get very far. I do agree this is an obvious problem to tackle with interpretability- I think it’s likely not that hard to get a rough idea why the cyclic attack works.
Yes, I did some interpretability on the policy network of Leela Zero. Planning to post the results very soon! But I did not particularly look into the attack described here, and while there was one REMIX group that looked into a problem related to liberty counting, they didn’t get very far. I do agree this is an obvious problem to tackle with interpretability- I think it’s likely not that hard to get a rough idea why the cyclic attack works.