I’m the chief scientist at Redwood Research.
ryan_greenblatt
I think if you look at “horizon length”—at what task duration (in terms of human completion time) do the AIs get the task right 50% of the time—the trends will indicate doubling times of maybe 4 months (though 6 months is plausible). Let’s say 6 months more conservatively. I think AIs are at like 30 minutes on math? And 1 hour on software engineering. It’s a bit unclear, but let’s go with that. Then, to get to 64 hours on math, we’d need 7 doublings = 3.5 years. So, I think the naive trend extrapolation is much faster than you think? (And this estimate strikes me as conservative at least for math IMO.)
Consider tasks that quite good software engineers (maybe top 40% at Jane Street) typically do in 8 hours without substantial prior context on that exact task. (As in, 8 hour median completion time.) Now, we’ll aim to sample these tasks such that the distribution and characteristics of these tasks are close to the distribution of work tasks in actual software engineering jobs (we probably can’t get that close because of the limited context constraint, but we’ll try).
In short timelines, I expect AIs will be able to succeed at these tasks 70% of the time within 3-5 years and if they didn’t, I would update toward longer timelines. (This is potentially using huge amounts of inference compute and using strategies that substantially differ from how humans do these tasks.)
The quantitative update would depend on how far AIs are from being able to accomplish this. If AIs were quite far (e.g., at 2 hours on this metric which is pretty close to where they are now) and the trend on horizon length indicated N years until 64 hours, I would update to something like 3 N as my median for AGI.
(I think a reasonable interpretation of the current trend indicates like 4 month doubling times. We’re currently at like a bit less than 1 hour for this metric I think, though maybe more like 30 min? Maybe you need to get to 64 hours until stuff feels pretty close to getting crazy. So, this suggests 2.3 year, though I expect longer in practice. My actual median for “AGI” in a strong sense is like 7 years, so 3x longer than this.)
Edit: Note that I’m not responding to “most impressive”, just trying to operationalize something that would make me update.
I would find this post much more useful to engage with if you more concretely described the type of tasks that you think AIs will remain bad and gave a bunch of examples. (Or at least made an argument for why it is hard to construct examples if that is your perspective.)
I think you’re pointing to a category like “tasks that require lots of serial reasoning for humans, e.g., hard math problems particularly ones where the output should be a proof”. But, I find this confusing, because we’ve pretty clearly seen huge progress on this in the last year such that it seems like the naive extrapolation would imply that systems are much better at this by the end of the year.
Already AIs seem to be not that much worse at tricky serial reasoning than smart humans:
My sense is that AIs are pretty competitive at 8th grade competition math problems with numerical answers and that are relatively shorter. As in, they aren’t much worse than the best 8th graders at AIME or similar.
At proofs, the AIs are worse, but showing some signs of life.
On logic/reasoning puzzles the AIs are already pretty good and seems to be getting better rapidly on any specific type of task as far as I could tell.
It would be even better if you pointed to some particular benchmark and made predictions.
Sam also implies that GPT-5 will be based on o3.
IDK if Sam is trying to imply this GPT-5 will be “the AGI”, but regardless, I think we can be pretty confident that o3 isn’t capable enough to automate large fractions of cognitive labor let alone “outperform humans at most economically valuable work” (the original openai definition of AGI).
I think 0.4 is far on the lower end (maybe 15th percentile) for all the way down to one accelerated researcher, but seems pretty plausible at the margin.
As in, 0.4 suggests that 1000 researchers = 100 researchers at 2.5x speed which seems kinda reasonable while 1000 researchers = 1 researcher at 16x speed does seem kinda crazy / implausible.
So, I think my current median lambda at likely margins is like 0.55 or something and 0.4 is also pretty plausible at the margin.
See appendix B.3 in particular:
Competitors receive a higher score for submitting their solutions faster. Because models can think in parallel and simultaneously attempt all problems, they have an innate advantage over humans. We elected to reduce this advantage in our primary results by estimating o3’s score for each solved problem as the median of the scores of the human participants that solved that problem in the contest with the same number of failed attempts.
We could instead use the model’s real thinking time to compute ratings. o3 uses a learned scoring function for test-time ranking in addition to a chain of thought. This process is perfectly parallel and true model submission times therefore depend on the number of available GPU during the contest. On a very large cluster the time taken to pick the top-ranked solutions is (very slightly more than) the maximum over the thinking times for each candidate submission. Using this maximum parallelism assumption and the sequential o3 sampling speed would result in a higher estimated rating than presented here. We note that because sequential test-time compute has grown rapidly since the early language models, it was not guaranteed that models would solve problems quickly compared to humans, but in practice o3 does.
I expect substantially more integrated systems than you do at the point when AIs are obsoleting (almost all) top human experts such that I don’t expect these things will happen by default and indeed I think it might be quite hard to get them to work.
METR has a list of policies here. Notably, xAI does have a policy so that isn’t correct on the tracker.
(I found it hard to find this policy, so I’m not surprised you missed it!)
Your description of GDM’s policy doesn’t take into account the FSF update.
However, it has yet to be fleshed out: mitigations have not been connected to risk thresholds
This is no longer fully true.
I’m a bit late for a review, but I’ve recently been reflecting on decision theory and this post came to mind.
When I initially saw this post I didn’t make much of it. I now feel like the thesis of “decision theory is very confusing and messed up” is true, insightful, and pretty important based on spending more time engaging with particular problems (mostly related to acausal/simulation trade and other interactions). I don’t know if the specific examples in this post aged well, but I think the bottom line is worth keeping in mind.
You are possibly the first person I know of who reacted to MONA with “that’s obvious”
I also have the “that’s obvious reaction”, but possibly I’m missing somne details. I also think it won’t perform well enough in practice to pencil given other better places to allocate safety budget (if it does trade off which is unclear).
It’s just surprising that Sam is willing to say/confirm all of this given that AI companies normally at least try to be secretive.
I doubt that person was thinking about the opaque vector reasoning making it harder to catch the rogue AIs.
(I don’t think it’s good to add a canary in this case (the main concern would be takeover strategies, but I basically agree this isn’t that helpful), but I think people might be reacting to “might be worth adding” and are disagree reacting to your comment because it says “are you actually serious” which seems more dismissive than needed. IMO, we want AIs trained on this if they aren’t themselves very capable (to improve epistemics around takeover risk) and I feel close to indifferent for AIs that are plausibly very capable as the effect on takeover plans is small and you still get some small epistemic boost.)
There are two interpretations you might have for that third bullet:
Can we stop rogue AIs? (Which are operating without human supervision.)
Can we stop AIs deployed in their intended context?
(See also here.)
In the context of “can the AIs takeover?”, I was trying to point to the rogue AI intepretation. As in, even if the AIs were rogue and had a rogue internal deployment inside the frontier AI company, how do they end up with actual hard power. For catching already rogue AIs and stopping them, opaque vector reasoning doesn’t make much of a diffence.
I think there are good reasons to expect large fractions of humans might die even if humans immediately surrender:
It might be an unstable position given that the AI has limited channels of influence on the physical world. (While if there are far fewer humans, this changes.)
The AI might not care that much or might be myopic or might have arbitrary other motivations etc.
For many people, “can the AIs actually take over” is a crux and seeing a story of this might help build some intuition.
Keeping the humans alive at this point is extremely cheap in terms of fraction of long term resource consumption while avoiding killing humans might substantially reduce the AI’s chance of successful takeover.
Wow, that is a surprising amount of information. I wonder how reliable we should expect this to be.
Importantly, this is an example of developing a specific application (surgical robot) rather than advancing the overall field (robots in general). It’s unclear whether the analogy to an individual application or an overall field is more appropriate for AI safety.