The big answer, now that we know what o1 was made using Q*/Strawberry, is essentially that Strawberry/Q* did 2 very important things:
It cracked the code on how to make a General Purpose Search that scales with more compute, and in particular the model can now adaptively think for longer on harder problems.
In essence, OpenAI figured out how to implement General Purpose Search scalably:
The big answer, now that we know what o1 was made using Q*/Strawberry, is essentially that Strawberry/Q* did 2 very important things:
It cracked the code on how to make a General Purpose Search that scales with more compute, and in particular the model can now adaptively think for longer on harder problems.
In essence, OpenAI figured out how to implement General Purpose Search scalably:
https://www.lesswrong.com/posts/6mysMAqvo9giHC4iX/what-s-general-purpose-search-and-why-might-we-expect-to-see
It unlocked a new inference scaling law, which in particular means that more compute can reliably solve more problems at inference.
This makes AI capabilities harder to contain, since it’s easier to have large inference runs than large training runs.