It was all my twitter feed was talking about, but I think it’s been really under-discussed in mainstream press.
RE Knoop’s comment, here are some relevant grafs from the ARC announcement blog post:
To adapt to novelty, you need two things. First, you need knowledge – a set of reusable functions or programs to draw upon. LLMs have more than enough of that. Second, you need the ability to recombine these functions into a brand new program when facing a new task – a program that models the task at hand. Program synthesis. LLMs have long lacked this feature. The o series of models fixes that.
For now, we can only speculate about the exact specifics of how o3 works. But o3′s core mechanism appears to be natural language program search and execution within token space – at test time, the model searches over the space of possible Chains of Thought (CoTs) describing the steps required to solve the task, in a fashion perhaps not too dissimilar to AlphaZero-style Monte-Carlo tree search. In the case of o3, the search is presumably guided by some kind of evaluator model. To note, Demis Hassabis hinted back in a June 2023 interview that DeepMind had been researching this very idea – this line of work has been a long time coming.
More in the ARC post.
My rough understanding is that it’s like a meta-CoT strategy, evaluating multiple different approaches.
It was all my twitter feed was talking about, but I think it’s been really under-discussed in mainstream press.
RE Knoop’s comment, here are some relevant grafs from the ARC announcement blog post:
More in the ARC post.
My rough understanding is that it’s like a meta-CoT strategy, evaluating multiple different approaches.