[Epistemic status: not new, but I thought I should share]
An intuition I’ve had for some time is that search is what enables an agent to control the future. I’m a chess player rated around 2000. The difference between me and Magnus Carlsen is that in complex positions, he can search much further for a win, such than I gave virtually no chance against him; the difference between me and an amateur chess player is similarly vast. It’s not just about winning either—in Shogi, the top professionals when they know they have won continue searching over future variations to find the most aesthetically appealing mate.
This is one of the reasons that I’m concerned about AI. It’s not bound by the same constraints of time, energy, and memory as humans, and as such it’s possible for it to search through possible futures very deeply to find the narrow path in which it achieves its goal. o3 looks to be on this path. It has both very long chains of though (depth of search), as well as the ability to parallelize across multiple instances (best-of-n sampling which solved ARC-AGI). To be clear, I don’t think this search is very efficient, and there are many obvious ways in which it can be improved. E.g. recurrent architectures which don’t waste as much compute computing logprobs for several tokens and sampling just one, or multi-token objectives for the base model as shown in Deepseek v3. But the basis for search is there. Till now, it seemed like AI was improving its intuition, now it can finally begin to think.
Concretely, I expect that by 2030, AI systems will use as much compute in inference time for hard problems as is currently used for pretraining the largest models. Possibly more if humanity is not around by then.
An intuition I’ve had for some time is that search is what enables an agent to control the future. I’m a chess player rated around 2000. The difference between me and Magnus Carlsen is that in complex positions, he can search much further for a win, such than I gave virtually no chance against him; the difference between me and an amateur chess player is similarly vast.
This is at best over-simplified in terms of thinking about ‘search’: Magnus Carlsen would also beat you or an amateur at bullet chess, at any time control:
As of December 2024, Carlsen is also ranked No. 1 in the FIDE rapid rating list with a rating of 2838, and No. 1 in the FIDE blitz rating list with a rating of 2890.[495]
(See for example the forward-pass-only Elos of chess/Go agents; Jones 2021 includes scaling law work on predicting the zero-search strength of agents, with no apparent upper bound.)
I think the natural counterpoint here is that the policy network could still be construed as doing search; just thst all the compute was invested during training and amortised later across many inferences.
Magnus Carlsen is better than average players for a couple reasons
Better “evaluation”; the ability to look at a position and accurately estimate likelihood of winning given optimal play
Better “search”; a combination of heuristic shortcuts and raw calculation power that let him see further ahead
So I agree that search isn’t the only relevant dimension. An average player given unbounded compute might overcome 1. just by exhaustively searching the game tree, but this seems to require such astronomical amounts of compute that it’s not worth discussing
The low resource configuration of o3 that only aggregates 6 traces already improved on results of previous contenders a lot, the plot of dependence on problem size shows this very clearly. Is there a reason to suspect that aggregation is best-of-n rather than consensus (picking the most popular answer)? Their outcome reward model might have systematic errors worse than those of the generative model, since ground truth is in verifiers anyway.
Thoughts on o3 and search:
[Epistemic status: not new, but I thought I should share]
An intuition I’ve had for some time is that search is what enables an agent to control the future. I’m a chess player rated around 2000. The difference between me and Magnus Carlsen is that in complex positions, he can search much further for a win, such than I gave virtually no chance against him; the difference between me and an amateur chess player is similarly vast. It’s not just about winning either—in Shogi, the top professionals when they know they have won continue searching over future variations to find the most aesthetically appealing mate.
This is one of the reasons that I’m concerned about AI. It’s not bound by the same constraints of time, energy, and memory as humans, and as such it’s possible for it to search through possible futures very deeply to find the narrow path in which it achieves its goal. o3 looks to be on this path. It has both very long chains of though (depth of search), as well as the ability to parallelize across multiple instances (best-of-n sampling which solved ARC-AGI). To be clear, I don’t think this search is very efficient, and there are many obvious ways in which it can be improved. E.g. recurrent architectures which don’t waste as much compute computing logprobs for several tokens and sampling just one, or multi-token objectives for the base model as shown in Deepseek v3. But the basis for search is there. Till now, it seemed like AI was improving its intuition, now it can finally begin to think.
Concretely, I expect that by 2030, AI systems will use as much compute in inference time for hard problems as is currently used for pretraining the largest models. Possibly more if humanity is not around by then.
This is at best over-simplified in terms of thinking about ‘search’: Magnus Carlsen would also beat you or an amateur at bullet chess, at any time control:
(See for example the forward-pass-only Elos of chess/Go agents; Jones 2021 includes scaling law work on predicting the zero-search strength of agents, with no apparent upper bound.)
I think the natural counterpoint here is that the policy network could still be construed as doing search; just thst all the compute was invested during training and amortised later across many inferences.
Magnus Carlsen is better than average players for a couple reasons
Better “evaluation”; the ability to look at a position and accurately estimate likelihood of winning given optimal play
Better “search”; a combination of heuristic shortcuts and raw calculation power that let him see further ahead
So I agree that search isn’t the only relevant dimension. An average player given unbounded compute might overcome 1. just by exhaustively searching the game tree, but this seems to require such astronomical amounts of compute that it’s not worth discussing
The low resource configuration of o3 that only aggregates 6 traces already improved on results of previous contenders a lot, the plot of dependence on problem size shows this very clearly. Is there a reason to suspect that aggregation is best-of-n rather than consensus (picking the most popular answer)? Their outcome reward model might have systematic errors worse than those of the generative model, since ground truth is in verifiers anyway.
That’s a good point, it could be consensus.