polytope comments on When do “brains beat brawn” in Chess? An experiment

polytope 7 Jul 2023 4:43 UTC
14 points
5
(I’m the main KataGo dev/researcher)
Just some notes about KataGo—the degree to which KataGo has been trained to play well vs weaker players is relatively minor. The only notable thing KataGo does is in some self-play games to give up to an 8x advantage in how many playouts one side has over the other side, where each side knows this. (Also KataGo does initialize some games with handicap stones to make them in-distribution and/or adjust komi to make the game fair). So the strong side learns to prefer positions that elicit higher chance of mistakes by the weaker side, while the weak side learns to prefer simpler positions where shallower search doesn’t harm things as much.
This method is cute because it adds pressure to only learn “general high-level strategies” for exploiting a compute advantage, instead of memorizing specific exploits (which one might hypothesize to be less likely to generalize to arbitrary opponents). Any specific winning exploit learned by the stronger side that works too well will be learned by the weaker side (it’s the same neural net!) and subsequently will be avoided and stop working.
And it’s interesting that “play for positions that a compute-limited yourself might mess up more” correlates with “play for positions that a weaker human player might mess up in”.
But because this method doesn’t adapt to exploit any particular other opponent, and is entirely ignorant of a lot of tendencies of play shared widely across all humans, I would still say it’s pretty minor. I don’t have hard data, but from firsthand subjective observation I’m decently confident that top human amateurs or pros do a better job playing high-handicap games (> 6 stones) against players that more than that many ranks weaker than them than KataGo would, despite KataGo being stronger in “normal” gameplay. KataGo definitely plays too “honestly”, even with the above training method, and lacks knowledge of what weaker humans find hard.

If you really wanted to build a strong anti-human handicap game bot in Go, you’d absolutely start by learning to better model human play, using the millions of games available online.
(As for the direct gap with the very best pro players, without any specific anti-bot exploits, at tournament-like time controls I think it’s more like 2 stones rather than 3-4. I could believe 3-4 for some weaker pros, or if you used ultra-blitz time controls, since shorter time controls tend to favor bots over humans).