DanielFilan comments on Inside the mind of a superhuman Go model: How does Leela Zero read ladders?

DanielFilan 2 Mar 2023 22:08 UTC
LW: 4 AF: 4
2
AF
Note that the model does not have black or white stones as a concept, and instead only thinks of the stones as “own’s stones” and “opponent’s stones”, so we can do this without loss of generality.
I’m confused how this can be true—surely the model needs to know which player is black and which player is white to know how to incorporate komi, right?
- polytope 4 Mar 2023 1:27 UTC
  LW: 5 AF: 4
  0
  AF Parent
  There’s (a pair of) binary channels that indicate whether the acting player is receiving komi or paying it. (You can also think of this as a “player is black” versus “player is white” indicator, but interpreting it as komi indicators is equivalent and is the natural way you would extend Leela Zero to operate on different komi without having to make any changes to the architecture or input encoding).
  In fact, you can set the channels to fractional values strictly between 0 and 1 to see what the model thinks of a board state given reduced komi or no-komi conditions. Leela Zero is not trained on any value other than the 0 or 1 endpoints corresponding to komi +7.5 or komi −7.5 for the acting player, so there is no guarantee that the behavior for fractional values is reasonable, but I recall people found that many of Leela Zero’s models do interpolate their output for the fractional values in a not totally unreasonable way!
  If I recall right, it tended to be the smaller models that behaved well, whereas some of the later and larger models behaved totally nonsensically for fractional values. If I’m not mistaken about that being the case, then as a total guess perhaps that’s something to do with later and larger models having more degrees of freedom with which to fit/overfit arbitrarily to arbitrarily give rise to non-interpolating behavior in between, and/or having more extreme differences in activations at the end points that constrain the middle less and give it more room to wiggle and do bad non-monotone things.