gwern comments on anaguma’s Shortform

gwern 31 Dec 2024 19:55 UTC
5 points
2

An intuition I’ve had for some time is that search is what enables an agent to control the future. I’m a chess player rated around 2000. The difference between me and Magnus Carlsen is that in complex positions, he can search much further for a win, such than I gave virtually no chance against him; the difference between me and an amateur chess player is similarly vast.

This is at best over-simplified in terms of thinking about ‘search’: Magnus Carlsen would also beat you or an amateur at bullet chess, at any time control:

As of December 2024, Carlsen is also ranked No. 1 in the FIDE rapid rating list with a rating of 2838, and No. 1 in the FIDE blitz rating list with a rating of 2890.[495]

(See for example the forward-pass-only Elos of chess/Go agents; Jones 2021 includes scaling law work on predicting the zero-search strength of agents, with no apparent upper bound.)
- Daniel Tan 1 Jan 2025 11:02 UTC
  1 point
  0
  Parent
  I think the natural counterpoint here is that the policy network could still be construed as doing search; just thst all the compute was invested during training and amortised later across many inferences.
  
  Magnus Carlsen is better than average players for a couple reasons
  1. Better “evaluation”; the ability to look at a position and accurately estimate likelihood of winning given optimal play
  2. Better “search”; a combination of heuristic shortcuts and raw calculation power that let him see further ahead
  So I agree that search isn’t the only relevant dimension. An average player given unbounded compute might overcome 1. just by exhaustively searching the game tree, but this seems to require such astronomical amounts of compute that it’s not worth discussing