Just listened to the imo team at OpenAI talk about their model. https://youtu.be/EEIPtofVe2Q?si=kIPDW5d8Wjr2bTFD Some notes:
The techniques they used are general, and especially useful for RL on hard-to-verify-solution-correctness problems.
It now says when it doesn’t know something, or didn’t figure it out. This is a requisite for training the model successfully on its own output.
The people behind the model are from the multi agent team. For one age to be bale to work with another, the reports from the other agent need to be trustworthy.
I have no idea what the community consensus is. I doubt they’re lying.
For anyone who already had short timelines this couldn’t shorten them that much. For instance, 2027 or 2028 is very soon, and https://ai-2027.com/ assumed there would be successful research done along the way. So for me, very little more “yikes” than yesterday.
It does not seem to me like this is the last research breakthrough needed for full fledged agi, either. LLMs are superhuman at no/low context buildup tasks, but haven’t solved context management (be that through long context windows, memory retrieval techniques, online learning or anything else).
I also don’t think it’s surprising that these research breakthroughs keep happening. Remember that their last breakthrough (strawberry, o1) was “make RL work”. This one might be something like “make reward prediction and MCTS work” like mu zero, or some other banal thing that worked on toy cases in the 80s but was non trivial to reimplement in LLMs.