gwern comments on When do “brains beat brawn” in Chess? An experiment

gwern 29 Jun 2023 20:59 UTC
22 points
4
Yes, this is another reason that setups like OP are lower-bounds. Stockfish, like most game RL AIs, is trying to play the Nash equilibrium move, not the maximally-exploitative move against the current player; it will punish the player for any deviations from Nash, but it will not itself risk deviating from Nash in the hopes of tempting the player into an even larger error, because it assumes that it is playing against something as good or better than itself, and such a deviation will merely be replied to with a Nash move & be very bad.

You could frame it as an imitation-learning problem like Maia. But also train directly: Stockfish could be trained with a mixture of opponents and at scale, should learn to observe the board state (I don’t know if it needs the history per se, since just the stage of game + current margin of victory ought to encode the Elo difference and may be a sufficient statistic for Elo), infer enemy playing strength, and calibrate play appropriately when doing tree search & predicting enemy response. Silver & Veness 2010 comes to mind as an example of how you’d do MCTS with this sort of hidden-information (the enemy’s unknown Elo strength) which turns it into a POMDP rather than a MDP.
- johnlawrenceaspden 30 Jun 2023 13:08 UTC
  20 points
  0
  Parent
  For a clear example of this, in endgames where I have a winning position but have little to no idea how to win, Stockfish’s king will often head for the hills, in order to delay the coming mate as long as theoretically possible.
  Making my win very easy because the computer’s king isn’t around to help out in defence.
  This is not a theoretical difficulty! It makes it very difficult to practise endgames against the computer.