Thane Ruthenis comments on The Information: OpenAI shows ‘Strawberry’ to feds, races to launch it

Thane Ruthenis 30 Aug 2024 1:49 UTC
17 points
9
Why?
It was already known the AGI Labs were experimenting with synthetic data and that OpenAI are training GPT-5, and the article is light on new details:
- It’s not really true that modern AIs “can’t reliably solve math problems they haven’t seen before”: this depends on the operationalization of “a math problem” and “seen before”. All this statement says is “Strawberry is better at math than the SOTA models”, which in turn means “nonzero AI progress”.
- Similar for hallucinations.
- The one concrete example is solving New York Connections, but Claude 3.5 can already do it on a good day.
I mean, the state of affairs is by no means not worrying, but I don’t really see what’s in this article would prompt a meaningful update?
- Raemon 30 Aug 2024 2:00 UTC
  6 points
  3
  Parent
  I also felt like this was mostly priced in, but I think a maybe more useful prompt for people who feel like they made an update: I think this is a good time to ask “How could I have thought that faster?”, and think about what updates you maybe still haven’t fully propagated.
  - Thane Ruthenis 30 Aug 2024 2:05 UTC
    2 points
    0
    Parent
    Agreed, always a good exercise to do when surprised.
- Czynski 30 Aug 2024 18:03 UTC
  4 points
  2
  Parent
  We knew they were experimenting with synthetic data. We didn’t know they were succeeding.
- Noosphere89 20 Sep 2024 14:27 UTC
  3 points
  0
  Parent
  The big answer, now that we know what o1 was made using Q*/Strawberry, is essentially that Strawberry/Q* did 2 very important things:
  1. It cracked the code on how to make a General Purpose Search that scales with more compute, and in particular the model can now adaptively think for longer on harder problems.
  In essence, OpenAI figured out how to implement General Purpose Search scalably:
  
  https://www.lesswrong.com/posts/6mysMAqvo9giHC4iX/what-s-general-purpose-search-and-why-might-we-expect-to-see
  1. It unlocked a new inference scaling law, which in particular means that more compute can reliably solve more problems at inference.
  This makes AI capabilities harder to contain, since it’s easier to have large inference runs than large training runs.