AI 2027 is a Bet Against Amdahl’s Law
EDIT: I’ve written a followup post, summarizing and responding to the key themes raised in the comments.
AI 2027 lies at a Pareto frontier – it contains the best researched argument for short timelines, or the shortest timeline backed by thorough research[1]. My own timelines are substantially longer, and there are credible researchers whose timelines are longer still. For this reason, I thought it would be interesting to explore the key load-bearing arguments AI 2027 presents for short timelines. This, in turn, allows for some discussion of signs we can watch for to see whether those load-bearing assumptions are bearing out.
To be clear, while the authors have short timelines, they do not claim that ASI is likely to arrive in 2027[2]. But the fact remains that AI 2027 is a well researched argument for short timelines. Let’s explore that argument.
(In what follows, I will mostly ignore confidence intervals and present only median estimates; this is a gross oversimplification of the results presented in the paper, but sufficient for my purpose here, which is to present the overall structure of the model and give a flavor of how that structure is fleshed out without going into overwhelming detail.)
Timeline to ASI
AI 2027 breaks down the path from present-day capabilities to ASI into five milestones[3]:
Saturate RE-Bench. This milestone is achieved when an AI reaches an average score of 1.5 on the RE-Bench suite of AI R&D tasks[4], which the authors feel matches the ability of the best human coders.
Superhuman coder (SC): “an AI system that can do any coding tasks that the best AGI company engineer does, while being much faster and cheaper[5].”
Superhuman AI researcher (SAR): “An AI system that can do the job of the best human AI researcher but faster, and cheaply enough to run lots of copies.”
Superintelligent AI researcher (SIAR): “An AI system that is vastly better than the best human researcher at AI research.”
Artificial superintelligence (ASI): “An AI system that is much better than the best human at every cognitive task.”
For the first milestone, they fit a logistic curve to historical progress on RE-Bench, and estimate that a score of 1.5 will be reached “some time in 2026”:
To estimate timelines for the remaining milestones, they use a two-step process. First, they estimate the time that would be required to reach that milestone (starting from the previous milestone), without AI assistance or compute scale-up. Then they estimate how much AI will accelerate the process. Compute scale-up is excluded because they expect things to move quickly enough that compute increases will not be a dominant factor:
In this writeup we focus primarily on a possible software-driven intelligence explosion, in which there is vast improvement in AI capabilities on the scale of months-years primarily driven by using compute more efficiently (improved software), rather than more training compute. This report discusses the possibility of a software-driven intelligence explosion at length. We focus on software because the feedback loops are stronger: improved algorithms can be almost immediately applied to train better AIs, while improved hardware designs require substantial time to produce at scale (using anything like our current methods).
Here are the median estimates of the “human-only, software-only” time needed to reach each milestone:
Saturating RE-Bench → Superhuman coder: three sets of estimates are presented, with medians summing to between 30 and 75 months[6]. The reasoning is presented here. (EDIT: Eli Lifland commented that “these estimates aren’t software-only, they include recent levels of compute scaling”.)
Superhuman coder → superhuman AI researcher: 3.4 years[7]. This estimate is explained here, and relies on educated guesswork such as “perhaps whatever worked for [superhuman coding] will also work for [superhuman AI research] with a bit of extra tinkering” or (in an alternate scenario) “with human engineers doing the labor, our guess is that it would take about 2-15 years.”
Superhuman AI researcher → superintelligent AI researcher: 19 years, explained here. In brief, they estimate how long it will take to progress from “median [frontier lab] researcher” to superhuman coder, and then argue that it will take roughly twice as many “cumulative effort doublings” to progress from superhuman AI researcher to superintelligent AI researcher.
Superintelligent AI researcher → artificial superintelligence: 95 years, explained here. I honestly cannot interpret the argument here (the wording is informal and I find it to be confusing), but it includes components such as “Achieving ASI in all cognitive tasks rather than just AI R&D: About half of an SAR→SIAR jump”.
This adds up to a median estimate[8] of well over a century to achieve ASI without compute scale-up or use of AI tools. The authors then estimate the extent to which AI tools will accelerate the process:
Today: 1.03x – 1.3x
After saturating RE-Bench: 1.05x – 1.6x
Superhuman coder: 5x
Superhuman AI researcher: 25x
Superintelligent AI researcher: 250x
ASI: 2000x
By applying these speedup factors to the “human-only, software-only” time estimates given above, they arrive at a probability range for the timeline to ASI:
I note that I am confused by this diagram. In particular, the legend indicates a 90th percentile forecast of “>2100” for ASI, but the diagram appears to show the probability dropping to zero around the beginning of 2032.
AI 2027 Is Not Strong Evidence for AI in 2027
Several things are true about the model for ASI timelines presented as part of AI 2027:
It is an extremely forthright and thorough piece of work.
The conclusions should not be taken too seriously; all this analysis does not add up to strong evidence to expect ASI to arrive in any particular year.
The model provides a useful framework which we can use to update our expectations as new evidence accumulates. If “all models are wrong, some are useful”, this is one of the useful ones.
Why do I say that the timeline model should not be taken too seriously? It relies on quite a few guesses and handwavy estimates. The authors make vigorous attempts to analyze important parameters in multiple ways whenever possible, i.e. to provide redundant mechanisms for estimating the same quantity. However, to my best understanding, they were often unable to do so – the model depends on many non-redundant leaps of intuition. For instance, the following passage is central to the estimate of the timeline from a superintelligent AI researcher to ASI:
Human-only timeline from SIAR to ASI
I’ll think about this in relative terms to how long it takes to cross from SAR to SIAR human-only as forecasted above.
There are 2 gaps to cross between SIAR and ASI:
Achieving 2 (median→best jumps) above the best human when looking at the whole field rather than a single company: a bit less than an SAR→SIAR jump
For AI R&D, my guess is that this requires a bit less of a jump as SAR→SIAR, because the median ML professional is a bit better than the worst AGI company researcher (if the worse researcher were as much worse than the median as the median was compared to the best, which may not be true in practice due to hiring cutoffs).
Achieving ASI in all cognitive tasks rather than just AI R&D: About half of an SAR→SIAR jump.
I think once an AI is extremely good at AI R&D, lots of these skills will transfer to other domains, so it won’t have to be that much more capable to generalize to all domains, especially if trained in environments designed for teaching general skills.
This passage contains multiple guesses – the gap from SIAR to ASI decomposing into (1) and (2), and the estimated scale of each component. Again, the overall model is an accumulation of many such intuitive leaps, primarily in serial rather than parallel, meaning that our confidence in the final result should be substantially less than our confidence in any individual step.
(To be clear, this is a much better model than I could have constructed. My intention is not to criticize, but merely to describe.)
The upshot is that I find it difficult to accept the AI 2027 model as strong evidence for short timelines, or indeed for any timeline in particular. The authors include expert forecasters, who understand how to estimate in the face of uncertainty and calculate probability distributions much better than I do, but I simply find it impossible to accept this concatenation of intuitive leaps as sufficient evidence to update very far.
In the next section, I’ll present some specific reasons I believe the AI 2027 model underestimates timelines. I’ll then conclude with some thoughts regarding what signs we should watch for to disambiguate whether things are playing out according to the AI 2027 model or my alternative assumptions.
Reasons The AI 2027 Forecast May Be Too Aggressive
#1: Simplified Model of AI R&D
The model doesn’t do much to explore the multitude of activities that go into AI R&D. There’s more than just designing, executing, and analyzing experiments. The process also includes collecting and cleaning data, designing evaluations, managing training runs, “vibe checks”, safety evaluations, and many other actions. Over time, models, use cases, and post-training processes are becoming more complex, and AI R&D will likely expand to encompass entirely new activities requiring different sorts of expertise. It seems likely that specific domains (such as coding, technical research, business planning, legal analysis, …) may each require unique approaches for generating synthetic data and evaluating outputs, probably with involvement from experts in those domains. I’m sure I’ve only scratched the surface – AI R&D is going to encompass a rich and heterogeneous variety of activities.
Inevitably, some of these activities will be harder to automate than others, delaying the overall timeline. It seems difficult to route around this problem. For instance, if it turns out to be difficult to evaluate the quality of model outputs for fuzzy / subjective tasks, it’s not clear how an R&D organization (regardless of how much or little automation it has incorporated) could rapidly improve model capabilities on those tasks, regardless of how much progress is being made in other areas.
#2: Amdahl’s Law
The model estimates a century-plus timeline to ASI, and then projects the work to take place over a few years of calendar time, on the expectation that AI tools will be accelerating progress by factors eventually exceeding 1000x. Such extreme speedups are only possible if acceleration is near universal, i.e. only if every tiny detail of the R&D process is amenable to acceleration.
The authors address this objection, but the counterargument strikes me as flawed. Here is the key paragraph:
To see why this is conceptually mistaken, consider a theoretical AI with very superhuman experiment selection capabilities but sub-human experiment implementation skills. Even if automation didn’t speed up implementation of AI experiments at all and implementation started as 50% of researchers’ time, if automation led to much better experiments being chosen, a >2x AI R&D progress multiplier could be achieved.
In essence, this is saying that if the pace of progress is the product of two factors (experiment implementation time, and quality of experiment choice), then AI only needs to accelerate one factor in order to achieve an overall speedup. However, AI R&D involves a large number of heterogeneous activities, and overall progress is not simply the product of progress in each activity. Not all bottlenecks will be easily compensated for or worked around.
Also remember that we are talking about very large speedups here. In practice, Amdahl’s Law often starts to bite when optimizing a system by factors as small as 2x. Projecting speedups reaching 2000x is “pricing to perfection”; if the ability to route around difficult-to-automate activities is anything short of perfect, progress will fall short of the anticipated speedup curves.
Looking at the quote above, I’ll note that “choosing better experiments” is a relatively advanced skill, which will likely not emerge until well after experiment implementation skills. More generally, the high-level decision-making skills needed to maintain accelerating progress in the face of powerful-but-uneven AI capabilities seem like they would not emerge until late in the game. The model assumes very high speedup factors (25x to 250x), implying very broad and robust capabilities, quite far in advance of ASI.
#3: Dependence on Narrow Data Sets
To the extent that the model is grounded in concrete measurements of AI capabilities, those measurements primarily come from benchmarks such as HCAST and RE-Bench, which primarily contain tidily encapsulated tasks from software engineering and related domains. We have very little data as to how far models will have to progress in order to tackle higher-level AI R&D skills such as “research taste”, let alone tasks outside of software engineering and AI R&D. And it seems likely that there is a strong correlation between tasks that are easy to benchmark, and tasks that are easy to train – we measure AIs on the things they’re best at.
While current AI models and tools are demonstrating substantial value in the real world, there is nevertheless a notorious gap between benchmark scores (“Ph.D level” and beyond) and real-world applicability. It strikes me as highly plausible that this reflects one or more as-yet-poorly-characterized chasms that may be difficult to cross.
#4: Hofstadter’s Law As Prior
Readers are likely familiar with Hofstadter’s Law:
It always takes longer than you expect, even when you take into account Hofstadter’s Law.
It’s a good law. There’s a reason it exists in many forms (see also the Programmer’s Credo[9], the 90-90 rule, Murphy’s Law, etc.) It is difficult to anticipate all of the complexity and potential difficulties of a project in advance, and on average this contributes to things taking longer than expected. Constructing ASI will be an extremely complex project, and the AI 2027 attempt to break it down into a fairly simple set of milestones and estimate the difficulty of each milestone seems like fertile territory for Hofstadter’s Law.
What To Watch For
The AI 2027 model incorporates many factors, but at the end of the day, the prediction of short timelines hinges on very high rates of acceleration of AI R&D. More precisely, it assumes extreme acceleration of a set of activities, beginning with a fairly narrow range of coding tasks, and then expanding quite broadly, to include all of the activities involved in developing new AI models, including high-level cognitive facilities such as “research taste” as well as any domain-specific work needed to inculcate expertise in various domains.
More simply, the model assumes that AI-driven speedups will be deep (very high acceleration factors) and broad (across a wide variety of activities). Depth has already been achieved; AI tools can already massively accelerate some tasks, such as coding simple and stereotypical video games[10]. The challenge will be going broad – expanding the range of tasks for which AI can provide high speedup factors. This suggests some things we can watch for:
What is the profile of acceleration across all tasks relating to AI R&D? What percentage of tasks are getting accelerated by 1.1x, 1.5x, 2x? If we see at least a modest uplift for a steadily increasing range and scope of tasks, that would be a sign that the objections listed above are not choking progress. If we see strong uplift for some tasks, but the set of uplifted tasks remains constrained and/or the level of human supervision required is not shrinking (again, across a broad range of tasks), that will be evidence that the model is overly optimistic.
What is the profile of AI uplift beyond AI R&D? Is the real-world applicability of AI to AI R&D being mirrored in a broader range of tasks and jobs? This will shed light on the gap from a superintelligent AI researcher to ASI.
As a prerequisite, it will be necessary to enumerate the set of activities that are necessary for “AI R&D”, as well as (for ASI) the broader range of cognitive tasks that humans undertake. I am not aware of a serious attempt at either task (on the latter subject, see my blog post, If AGI Means Everything People Do… What is it That People Do?). Characterizing the full range of activities involved in advancing AI capabilities would be a valuable contribution to timeline modeling.
- ^
Of course, the timeline forecast is not the only important contribution of AI 2027.
- ^
From https://controlai.news/p/special-edition-the-future-of-ai:
Eli: The scenario [referring, I believe, to “artificial superintelligence arriving in December 2027”] is roughly my 80th percentile speed, i.e. I assign 20% to things going at least this fast. This is similar to Scott Alexander’s view. So I find it very plausible but not my median scenario. It is however roughly my modal view, I think 2027 or 2028 is the most likely year that superhuman coders [note, not AGI or ASI] arrive.
- ^
I’m presenting things in a slightly different way than the source material. This is my best attempt to summarize the analysis and make it easy to assimilate. If I have distorted the argument in any way, I welcome corrections.
- ^
Actually “a subset of 5 of the 7 RE-Bench tasks due to issues with scoring in the remaining two”.
- ^
A longer definition is also provided:
Superhuman coder (SC): An AI system for which the company could run with 5% of their compute budget 30x as many agents as they have human research engineers, each of which is on average accomplishing coding tasks involved in AI research (e.g. experiment implementation but not ideation/prioritization) at 30x the speed (i.e. the tasks take them 30x less time, not necessarily that they write or “think” at 30x the speed of humans) of the company’s best engineer. This includes being able to accomplish tasks that are in any human researchers’ area of expertise.
- ^
The paper also presents an alternative method for estimating steps 1+2, extrapolating from the recent METR paper Measuring AI Ability to Complete Long Tasks.
- ^
Actually “15% 0 years; Otherwise 4 years (80% CI: 1.5 to 10; lognormal)”.
- ^
I am being very sloppy with my statistics here; for instance, it’s probably not valid to add the median estimates for several steps in a process and present the total as a median estimate of the overall process.
- ^
“We do these things not because they are easy, but because we thought they were going to be easy.”
- ^
This choice of a relatively frivolous example is not meant to be dismissive; it was just the first example that came to mind where people seem to be routinely vibe-coding fairly complex pieces of software. There are probably examples that are directly relevant to AI R&D, but I don’t have them at the tip of my tongue.
AI 2027 is more useful for the arguments than the specific year but even if not as aggressive, prediction markets (or at least Manifold) predict 61% chance before 2030, 65% before 2031, 73% by 2033.
I, similarly, can see it happening slightly later than 2027-2028 because some specific issues take longer to solve than others but I see no reason to think a timeline beyond 2035, like yours, let alone 30 years is grounded in reality.
It also doesn’t help that when I look at your arguments and apply them to what would then seem to be very optimistic forecast in 2020 about progress in 2025 (or even Kokotajlo’s last forecast), those same arguments would have similarly rejected what has happened.
I wouldn’t take those markets too seriously. The resolution criteria arent clear and some years have fewer than 100 traders. Also I just moved some of them down a couple of percentage points.
Great post, I agree with everything you say in the first section. I disagree with your bottlenecks / amdahls law objection for reasons Ryan mentions; I think our analysis stands firm / takes those bottlenecks into account. (Though tbc we are very uncertain, more research is needed) As for hofstadters law, I think it is basically just the planning fallacy and yeah I think it’s a reasonable critique that insofar as our AI timelines are basically formed by doing something that looks like planning, we probably have a bias we need to correct for. I want to think more about the extent to which out timelines methodology is analogous to planning.
Thanks! I agree that my statements about Amdahl’s Law primarily hinge on my misunderstanding of the milestones, as elucidated in the back-and-forth with Ryan. I need to digest that; as Ryan anticipates, possibly I’ll wind up with thoughts worth sharing regarding the “human-only, software-only” time estimates, especially for the earlier stages, but it’ll take me some time to chew on that.
(As a minor point of feedback, I’d suggest adding a bit of material near the top of the timelines and/or takeoff forecasts, clarifying the range of activities meant to be included in “superhuman coder” and “superhuman AI researcher”, e.g. listing some activities that are and are not in scope. I was startled to see Ryan say “my sense is that an SAR has to be better than humans at basically everything except vision”; I would never have guessed that was the intended interpretation.)
(“Has to” is maybe a bit strong, I think I probably should have said “will probably end up needing to be better competitive with the best human experts at basically everything (other than vision) and better at more central AI R&D given the realistic capability profile”. I think I generally expect full automation to hit everywhere all around the same time putting aside vision and physical tasks.)
This is fair. To the extent we have chosen what activities to include, it’s supposed to encompass everything that any researcher/engineer currently does to improve AIs’ AI R&D capabilities within AGI companies, see the AI R&D progress multiplier definition: “How much faster would AI R&D capabilities...”. As to whether we should include activities that researchers or engineers don’t do, my instinct is mostly no because the main thing I can think of there is data collection, and that feels like it should be treated separately (in the AI R&D progress multiplier appendix, we clarify that using new models for synthetic data generation isn’t included in the AI R&D progress multiplier as we want to focus on improved research skills, though I’m unsure if that the right choice and am open to changing).
But I did not put a lot of effort into thinking about how exactly to define the range of applicable activities and what domains should be included; My intuition is that it matters less than you think because I expect automation to be less jagged than you (I might write more about that in a separate comment) and because of intuitions that research taste is the key skill and is relatively domain-general, though I agree expertise helps. I agree that there will be varying multipliers depending on the domain, but given that the takeoff forecast is focused mostly on a set of AI R&D-specific milestones, I think it makes sense to focus on that.
I’m worried that you’re missing something important because you mostly argue against large AI R&D multipliers, but you don’t spend much time directly referencing compute bottlenecks in your arguments that the forecast is too aggressive.
Consider the case of doing pure math research (which we’ll assume for simplicity doesn’t benefit from compute at all). If we made emulated versions of the 1000 best math researchers and then we made 1 billion copies of each of them them which all ran at 1000x speed, I expect we’d get >1000x faster progress. As far as I can tell, the words in your arguments don’t particularly apply less to this situation than the AI R&D situation.
Going through the object level response for each of these arguments in the case of pure math research and the correspondence to the AI R&D:
Math: Yes, there are many tasks in math R&D, but the 1000 best math researchers could already do them or learn to do them.
AI R&D: By the time you have SAR (superhuman AI researcher), we’re assuming the AIs are better than the best human researchers(!), so heterogenous tasks don’t matter if you accept the premise of SAR: whatever the humans could have done, the AIs can do better. It does apply to the speed ups at superhuman coders, but I’m not sure this will make a huge difference to the bottom line (and you seem to mostly be referencing later speed ups).
Math: The speed up is near universal because we can do whatever the humans could do.
AI R&D: Again, the SAR is strictly better than humans, so hard-to-automate activities aren’t a problem. When we’re talking about ~1000x speed up, the authors are imagining AIs which are much smarter than humans at everything and which are running 100x faster than humans at immense scale. So, “hard to automate tasks” is also not relevant.
All this said, compute bottlenecks could be very important here! But the bottlenecking argument must directly reference these compute bottlenecks and there has to be no way to route around this. My sense is that much better research taste and perfect implementation could make experiments with some fixed amount of compute >100x more useful. To me, this feels like the important question: how much can labor results in routing around compute bottlenecks and utilizing compute much more effectively. The naive extrapolation out of the human range makes this look quite aggressive: the median AI company employee is probably 10x worse at using compute than the best, so an AI which as superhuman as 2x the gap between median and best would naively be 100x better at using compute than the best employee. (Is the research taste ceiling plausibly this high? I currently think extrapolating out another 100x is reasonable given that we don’t see things slowing down in the human range as far as we can tell.)
This is only applicable to the timeline to the superhuman coder milestone, not to takeoff speeds once we have a superhuman coder. (Or maybe you think a similar argument applies to the time between superhuman coder and SAR.)
Math: We’re talking about speed up relative to what the human researchers would have done by default, so this just divides both sides equally and cancels out.
AI R&D: The should also just divide both sides. That said, Hofstadter’s Law does apply to the human-only, software-only times between milestones. But note that these times are actually quite long! (Maybe you think they are still too short, in which case fair enough.)
I’ve (briefly) addressed the compute bottleneck question on a different comment branch, and “hard-to-automate activities aren’t a problem” on another (confusion regarding the definition of various milestones).
I do think it applies, if indirectly. Most data relating to progress in AI capabilities comes from benchmarks of crisply encapsulated tasks. I worry this may skew our collective intuitions regarding progress toward broader capabilities, especially as I haven’t seen much attention paid to exploring the delta between things we currently benchmark and “everything”.
This feels like one of those “the difference between theory and practice is smaller in theory than in practice” situations… Hofstadter’s Law would imply that Hofstadter’s Law applies here. :-)
For one concrete example of how that could manifest, perhaps there is a delay between “AI models exist that are superhuman at all activities involved in developing better models” and “those models have been fully adopted across the organization”. Interior to a frontier lab, that specific delay might be immaterial, it’s just meant as an existence proof that there’s room for us to be missing things.
Imagine if evolution could talk. “Yes, humans are very intelligent, but surely they couldn’t create airplanes 50,000 times heavier than the biggest bird in only 1,000 years. Evolution takes millions of years, and even if you can speed up some parts of the process, other parts will remain necessarily slow.”
But maybe the most ambitious humans do not even consider waiting millions of years, and making incremental improvements on million year techniques. Instead, they see any technique which takes a million years as a “deal breaker,” and only make use of techniques which they can use within the timespan of years. Yet humans are smart enough and think fast enough that even when they restrict themselves to these faster techniques, they can still eventually build an airplane, one much heavier than birds.
Likewise, an AI which is smart enough and thinks fast enough, might still eventually invent a smarter AI, one much smarter than itself, even when restricted to techniques which don’t require months of experimentation (analogous to evolution). Maybe just by training very small models very quickly, they can discover a ton of new technologies which can scale to large models. State-of-the-art small models (DeepSeek etc.) already outperform old large models. Maybe they can invent new architectures, new concepts, and who knows what.
In real life, there might be no fine line between slow techniques and fast techniques, but a gradual transition from approaches which use more slower techniques and approaches which use less slower techniques.
This is valid, but doesn’t really engage with the specific arguments here. By definition, when we consider the potential for AI to accelerate the path to ASI, we are contemplating the capabilities of something that is not a full ASI. Today’s models have extremely jagged capabilities, with lots of holes, and (I would argue) they aren’t anywhere near exhibiting sophisticated high-level planning skills able to route around their own limitations. So the question becomes, what is the shape of the curve of AI filling in weak capabilities and/or developing sophisticated strategies for routing around those weaknesses?
This is exactly missing the point. Training a cutting-edge model today involves a broad range of activities, not all of which fall under the heading of “discovering technologies” or “improving algorithms” or whatever. I am arguing that if all you can do is find better algorithms rapidly, that’s valuable but it’s not going to speed up overall progress by very large factors. Also, it may be that “by training very small models very quickly”, the AI would discover new technologies that improve some aspects of models but fail to advance some other important aspects.
Yeah, sorry I didn’t mean to argue that Amdahl’s Law and Hofstadter’s Law are irrelevant, or that things are unlikely to go slowly.
I see a big chance that it takes a long time, and that I end up saying you were right and I was wrong.
However, if you’re talking about “contemplating the capabilities of something that is not a full ASI. Today’s models have extremely jagged capabilities, with lots of holes, and (I would argue) they aren’t anywhere near exhibiting sophisticated high-level planning skills able to route around their own limitations.”
That seems to apply to the 2027 “Superhuman coder” with 5x speedup, not the “Superhuman AI researcher” with 25x speedup or “Superintelligent AI researcher” with 250x.
I think “routing around one’s own limitations” isn’t necessarily that sophisticated. Even blind evolution does it, by trying something else when one thing fails.
As long as the AI is “smart enough,” even if they aren’t that superhuman, they have the potential to think many times faster than a human, with a “population” many times greater than that of AI researchers. They can invent a lot more testable ideas and test them all.
Maybe I’m missing the point, but it’s possible that we simply disagree on whether the point exists. You believe that merely discovering technologies and improving algorithms isn’t sufficient to build ASI, while I believe there is a big chance that doing that alone will be sufficient. After discovering new technologies from training smaller models, they may still need one or two large training runs to implement it all.
I’m not arguing that you don’t have a good insights :)
One reason I don’t put much weight on this for timelines forecasts is that to the extent I might have done so before, I would have been more wrong from my current view. For example, my AGI timelines median 3 years ago was 2060ish, and since then I’ve updated toward an AGI median of more like 2031 due to reasons including underpredicting benchmark scores, underpredicting real-world impacts, and the model we built for AI 2027.
(wow, I didn’t remember that my median 3 years ago was 2060ish, wild)
I think it’s just that the tail is very long and flat with <1% per year. So, it looks like it goes to zero, but it stays just above.
Exactly. More fundamentally, that is not a probability graph, it’s a probability density graph, and we’re not shown the line beyond 2032 but just have to assume the integral from 2100-->infinity is >10% of the integral from 0-->infinity. Infinity is far enough away that the decay doesn’t even need to be all that slow for the total to be that high.
Thanks everyone for all the feedback and answers to my unending questions! The branching comments are starting to become too much to handle, so I’m going to take a breather and then write a followup post – hopefully by the end of the week but we’ll see – in which I’ll share some consolidated thoughts on the new (to me) ideas that surfaced here and also respond to some specific points.
This step, especially, really struck me as under-argued relative to how important it seems to be for the conclusion. This isn’t to pick on the authors of AI 2027 in particular. I’m generally confused as to why arguments for an (imminent) intelligence explosion don’t say more on this point, as far as I’ve read. (I’m reminded of this comic.) But I might well have missed something!
The basic arguments are that (a) becoming fully superhuman at something which involves long-horizon agency across a diverse range of situations seems like it requires agency skills that will transfer pretty well to other domains (b) once AIs have superhuman data efficiency, they can pick up whatever domain knowledge they need for new tasks very quickly.
I agree we didn’t justify it thoroughly in our supplement, the reason it’s not justified more is because we didn’t get around to it.
Couldn’t the Amdahl’s Law argument work in the opposite direction (i.e. even shorter timelines)?
Suppose AI R&D conducted by humans would take 100 years to achieve ASI. By Amdahl’s Law, there is likely some critical aspect of research that humans are particularly bad at, that causes the research to take a long time. An SAR might be good at the things humans are bad at (in a way that can’t be fixed by humans + AI working together—human + calculator is much better at arithmetic than human alone, but human + AlphaGo isn’t better at Go). So SAR might be able to get ASI in considerably less than 100 human-equivalent-years.
It seems to me that, a priori, we should expect Amdahl’s Law to affect humans and SARs to the same degree, so it shouldn’t change our time estimate. Unless there is some specific reason to believe that human researchers are less vulnerable to Amdahl’s Law; I don’t know enough to say whether that’s true.
That’s not how the math works. Suppose there are 200 activities under the heading of “AI R&D” that each comprise at least 0.1% of the workload. Suppose we reach a point where AI is vastly superhuman at 150 of those activities (which would include any activities that humans are particularly bad at), moderately superhuman at 40 more, and not much better than human (or even worse than human) at the remaining 10. Those 10 activities where AI is not providing much uplift comprise at least 1% of the AI R&D workload, and so progress can be accelerated at most 100x.
This is oversimplified; there is some room for superhuman ability (making excellent choices of experiments to run) can compensate for lack of uplift in other areas (time to code and execute individual experiments). But the fundamental point remains: a complex process can be bottlenecked by its slowest step. Amdahl’s Law is not symmetric – a chain can’t be as strong as its strongest link.
Another way to put this disagreement is that you can interpret all of the AI 2027 capability milestones as refering to the capability of the weakest bottlenecking capability, so:
Superhuman coder has to dominate all research engineers at all pure research engineering tasks. This includes the most bottlenecking capability.
SAR has to dominate all human researchers, which must include whatever task would otherwise bottleneck.
SIAR (superintelligent AI research) has to be so good at AI research—the gap between SAR and SIAR is 2x the gap between an automated median AGI company researcher and a SAR—that it has this huge 2x gap advantage over the SAR despite the potentially bottlenecking capabilities.
So, I think perhaps what is going on is that you mostly disagree with the human-only, software-only times and are plausibly mostly on board otherwise.
I think my short, narrowly technical response to this would be “agreed”.
Additional thoughts, which I would love your perspective on:
1. I feel like the idea that human activities involved in creating better models are broader than just, like, stereotypical things an ML Ph.D would do, is under-explored. Elsewhere in this thread you say “my sense is that an SAR has to be better than humans at basically everything except vision.” There’s a lot to unpack there, and I don’t think I’ve seen it discussed anywhere, including in AI 2027. Do stereotypical things an ML Ph.D would do constitute 95% of the work? 50%? Less? Does the rest of the work mostly consist of other sorts of narrowly technical software work (coding, distributed systems design, etc.), or is there broad spillover into other areas of expertise, including non-STEM expertise? What does that look like? Etc.
(I try to make this point a lot, generally don’t get much acknowledgement, and as a result have started to feel a bit like a crazy person. I appreciate you giving some validation to the idea. Please let me know if you suspect I’ve over-interpreted that validation.)
1a. Why “except vision”? Does an SAR have to be superhuman at creative writing, so that it can push forward creative writing capabilities in future models? (Obviously, substitute any number of other expertise domains for “creative writing”.) If yes, then why doesn’t it also need to be superhuman at vision (so that it can push forward vision capabilities)? If no, then presumably creative writing is one of the exceptions implied by the “basically” qualifier, what else falls in there?
2. “Superhuman AI researcher” feels like a very bad term for a system that is meant to be superhuman at the full range of activities involved in producing better models. It strongly suggests a narrower set of capabilities, thus making it hard to hold onto the idea that a broad definition is intended. Less critically, it also seems worthwhile to better define what is meant to fall within the umbrella of “superhuman coder”.
3. As I read through AI 2027 and then wrote my post here, I was confused as to the breadth of skills meant to be implied by “superhuman coder” and (especially) “superhuman AI researcher”, and probably did not maintain a consistent definition in my head, which may have confused my thinking.
4. I didn’t spend much time evaluating the reasoning behind the estimated speedups at each milestone (5x, 25x, 250x, 2000x). I might have more to say after digging into that. If/when I find the time, that, plus the discussion we’ve just had here, might be enough grist for a followup post.
Slightly? My view is more like:
For AIs to be superhuman AI researchers, they probably need to match humans at most underlying/fundamental cognitive tasks, including reasonably sample efficient learning. (Or at least learning which is competitive with humans given the AIs structural advantages.)
This means they can probably learn how to do arbitary things pretty quickly and easily.
I think non-ML/software-engineering expertise (that you can’t quickly learn on the job) is basically never important in building more generally capable AI systems aside from maybe various things related to acquiring data from humans. (But IMO this won’t ultimately be needed.)
Do human ML researcherse have to be superhuman at creative writing to push forward creative writing capabilites? I don’t particularly think so. Data might need to come from somewhere, but in the vision case, there are plenty of approaches which don’t require AIs with superhuman vision.
In the creative writing case, it’s a bit messy because the domain is intrinsically subjective. I nonetheless think you could make an AI which is superhuman at creative writing without good understanding of creative writing using just the (vast vast) quantity of data we already have on the internet.
Thanks.
I’m now very strongly feeling the need to explore the question of what sorts of activities go into creating better models, what sorts of expertise are needed, and how that might change as things move forward. Which unfortunately I know ~nothing about, so I’ll have to find some folks who are willing to let me pick their brains...
I think this is a good question. I’d love to hear from people with experience building frontier models have to say about it.
Meanwhile, my first pass at decomposing “activities that go into creating better models” into some distinct components that might be relevant in this discussion:
Core algorithmic R&D: choose research questions, design & execute experiments, interpret findings
ML engineering: build & maintain distributed training setup, along with the infra and dev ops that go along with a complex software system
Data acquisition and curation: collect, filter, clean datasets; hire humans to produce/QA; generate synthetic data
Safety research and evaluation: red-teaming, interpretability, safety-specific evals, AI-assisted oversight, etc.
External productization: product UX and design, UX-driven performance optimization, legal compliance and policy, marketing, and much more.
Physical compute infrastructure: GPU procurement, data center building and management, power procurement, likely various physical logistics.
(I wonder what’s missing from this?)
Eli suggested above that we should bracket the issue of data. And I think it’s also reasonable to set aside 4 and 5 if we’re trying to think about how quickly a lab could iterate internally.
If we do that, we’re left with 1, 2, and 6. I think 1 and 2 are covered even by a fairly narrow definition of “superhuman (AI researcher + coder)”. I’m uncertain what to make of 6, besides having a generalized “it’s probably messier and more complicated than I think” kind of feeling about it.
This, and the same description for the other milestones, aren’t completely right; it’s possible that there are some activities on which the SAR is worse. But it can’t be many activities and it can’t be much worse at them, given that the SAR needs to overall be doing the job of the best human researcher 30x faster.
I think my description is consistent with “some activities on which the SAR is worse” as long as these aren’t bottlenecking and it is overall dominating human researchers (as in, adding human researchers is negligable value).
But whatever, you’re the author here.
Maybe “Superhuman coder has to dominate all research engineers at all pure research engineering tasks” is too strong though.
Ok yeah, seems like this is just a wording issue and we’re on the same page.
Hmm, I think your argument is roughly right, but missing a key detail. In particular, the key aspect of the SARs (and higher levels of capability) is that they can be strictly better than humans at everything while simultaneously being 30x faster and 30x more numerous. (Or, there is 900x more parallel labor, but we can choose to run this as 30x more parallel instances each running 30x faster.)
So, even if these SARs are only slightly better than humans at these 10 activities and these activities don’t benefit from parallelization at all, they can still do them 30x faster!
So, progress can actually be accelerated by up to 3000x even if the AIs are only as good as humans at these 10 activities and can’t productively dump in more labor.
In practice, I expect that you can often pour more labor into whatever bottlenecks you might have. (And compensate etc as you noted.)
By the time the AIs have a 1000x AI R&D multiplier, they are running at 100x human speed! So, I don’t think the argument for “you won’t get 1000x uplift” can come down to amdahl’s law argument for automation itself. It will have to depend on compute bottlenecks.
(My sense is that the progress multipliers in AI 2027 are too high but also that the human-only times between milestones are somewhat too long. On net, this makes me expect somewhat slower takeoff with a substantial chance on much slower takeoff.)
This is valid for activities which benefit from speed and scale. But when output quality is paramount, speed and scale may not always provide much help?
My mental model is that, for some time to come, there will be activities where AIs simply aren’t very competent at all, such that even many copies running at high speed won’t provide uplift. For instance, if AIs aren’t in general able to make good choices regarding which experiments to run next, then even an army of very fast poor-experiment-choosers might not be worth much, we might still need to rely on people to choose experiments. Or if AIs aren’t much good at evaluating strategic business plans, it might be hard to train AIs to be better at running a business (a component of the SAIR → ASI transition) without relying on human input for that task.
For Amdah’s Law purposes, I’ve been shorthanding “incompetent AIs that don’t become useful for a task even when taking speed + scale into account” as “AI doesn’t provide uplift for that task”.
EDIT: of course, in practice it’s generally at least somewhat possible to trade speed+scale for quality, e.g. using consensus algorithms, or generate-and-test if you have a good way of identifying the best output. So a further refinement is to say that very high acceleration requires us to assume that this does not reach importantly diminishing returns in a significant set of activities.
EDIT2:
I find this quite plausible.
Sure, but for output quality better than what humans could (ever) do to matter for the relative speed up, you have to argue about compute bottlenecks, not Amdahl’s law for just the automation itself! (As in, if some humans would have done something in 10 years and it doesn’t have any environmental bottleneck, then 10x faster emulated humans can do it in 1 year.)
Notably, SAR is defined as “Superhuman AI researcher (SAR): An AI system that can do the job of the best human AI researcher but faster, and cheaply enough to run lots of copies.” So, it is strictly better than the best human researcher(s)! So, your statement might be true, but is irrelevant if we’re conditioning on SAR.
It sounds like your actual objection is in the human-only, software-only time from superhuman coder to SAR (you think this would take more than 1.5-10 years).
Or perhaps your objection is that you think there will be a smaller AI R&D multiplier for superhuman coders. (But this isn’t relevant once you hit full automation!)
I’m having trouble parsing this sentence… which may not be important – the rest of what you’ve said seems clear, so unless there’s a separate idea here that needs responding to then it’s fine.
Agreed that these two statements do a fairly good job of characterizing my objection. I think the discussion is somewhat confused by the term “AI researcher”. Presumably, for an SAR to accelerate R&D by 25x, “AI researcher” needs to cover nearly all human activities that go into AI R&D? And even more so for SAIR/250x. While I’ve never worked at an AI lab, I presume that the full set of activities involved in producing better models is pretty broad, with tails extending into domains pretty far from the subject matter of an ML Ph.D and sometimes carried out by people whose job titles and career paths bear no resemblance to “AI researcher”. Is that a fair statement?
If “producing better models” (AI R&D) requires more than just narrow “AI research” skills, then either SAR and SAIR need to be defined to cover that broader skill set (in which case, yes, I’d argue that 1.5-10 years is unreasonably short for unaccelerated SC->SAR), or if we stick with narrower definitions for SAR and SAIR then, yes, I’d argue for smaller multipliers.
You said “This is valid for activities which benefit from speed and scale. But when output quality is paramount, speed and scale may not always provide much help?”. But, when considering activities that aren’t bottlenecked on the environment, then to achieve 10x acceleration you just need 10 more speed at the same level of capability. In order for quality to be a crux for a relative speed up, there needs to be some environmental constraint (like you can only run 1 experiment).
Yep, my sense is that an SAR has to[1] be better than humans at basically everything except vision.
(Given this, I currently expect that SAR comes at basically the same time as “superhuman blind remote worker”, at least when putting aside niche expertise which you can’t learn without a bunch of interaction with humans or the environment. I don’t currently have a strong view on the difficulty of matching human visual abilites, particulary at video processing, but I wouldn’t be super surprised if video processing is harder than basically everything else ultimately.)
It is defined to cover the broader set? It says “An AI system that can do the job of the best human AI researcher?” (Presumably this is implicitly “any of the best AI researchers which presumably need to learn misc skills as part of their jobs etc.) Notably, Superintelligent AI researcher (SIAR) happens after “superhuman remote worker” which requires being able to automate any work a remote worker could do.
I’m guessing your crux is that the time is too short?
“Has to” is maybe a bit strong, I think I probably should have said “will probably end up needing to be better competitive with the best human experts at basically everything (other than vision) and better at more central AI R&D given the realistic capability profile”. I think I generally expect full automation to hit everywhere all around the same time putting aside vision and physical tasks.
We now have several branches going, I’m going to consolidate most of my response in just one branch since they’re converting onto similar questions anyway. Here, I’ll just address this:
I’m imagining that, at some intermediate stages of development, there will be skills for which AI does not even match human capability (for the relevant humans), and its outputs are of unusably low quality.
Let’s say out of those 200 activities, (for simplicity) 199 would take humans 1 year, and one takes 100 years. If a researcher AI is only half as good as humans at some of the 199 tasks, but 100x better at the human-bottleneck task, then AI can do in 2 years what humans can do in 100.
Yes, but you’re assuming that human-driven AI R&D is very highly bottlenecked on a single, highly serial task, which is simply not the case. (If you disagree: which specific narrow activity are you referring to that constitutes the non-parallelizable bottleneck?)
Amdahl’s Law isn’t just a bit of math, it’s a bit of math coupled with long experience of how complex systems tend to decompose in practice.
New commenter here, I think this is a great post. I think the distribution given by AI 2027 is actually close to correct, and is maybe even too slow (I would expect SAR+ to give a bit more of a multiplier to R&D). It seems like most researchers are assuming that ASI will look like scaled LLMs + scaffolding, but I think that transformer-based approaches will be beaten out by other architectures at around SAR level, since transformers were designed to be language predictors rather than reasoners.
This makes my most likely paths to ASI either “human researchers develop new architecture which scales to ASI” or “human researchers develop LLMs at SC-SAR level, which then develop new architecture capable of ASI”. I also think a FOOM-like scenario with many OOMs of R&D multiplier is more likely, so once SIAR comes along there would probably be at most a few days to full ASI.
AI R&D is far less susceptible to Amdahl’s law than pretty much anything else, as it’s only bottlenecked on compute and sufficiently general intelligence. You’re right that if future AIs are about as general as current LLMs, then automation of AI R&D will be greatly slowed, but I see no reason why generality won’t increase in the future.
Lastly, I think that many of the difficulties relating to training data (especially for specialist tasks) will become irrelevant in the future as AIs become more general. In other words, the AIs will be able to generalize from “human specialist thought in one area” to “human specialist thought in X” without needing training data in the latter.
I agree that without these assumptions, the scenario in AI 2027 would be unrealistically fast.
A late 2024 n=4 survey of frontier AI researchers estimated a median of a 1.15x AI R&D progress multiplier relative to no post-2022 AIs. I’d like to see bigger surveys here but FWIW my best guess is that we’re already at a ~1.1x progress multiplier.
You probably know this, but for onlookers the magnitude of these chasms are discussed in our timelines forecast, method 2.
You’re bringing up a more sophisitcated objection than the one I was addressing, which didn’t acknowledge things like multiplier effects or being able to shift the task distribution.
Regarding research taste / experiment selection coming well after experiment implementation, I disagree in my median case depending on what you mean by “well after,” due to intuitions I describe in my other comment. Also I’d note that there are some early signs of research taste or management being fairly automatable (take these with pinches of salt, ofc there are limitations!).
Also, we think that research taste is a multiplier which isn’t tied to many distinct sub-activities, curious if you could provide examples of several sub-activities without super correlated capabilities.
FWIW, these are heavily informed by surveys of frontier AI researchers, including a more recent higher sample size survey that we haven’t made public yet but gave similar results to are previous smaller ones (though the recent one was done quicker so may have more definitional confusion issues).
One reason I expect less jaggeed progress than you is that my intuition is that even for tasks that are theoretically easy to verify/check, if they take a long time for humans and are very valuable, they will still often be hard to automate if there aren’t easily verifiable intermediate outputs. For example, perhaps it’s much easier to automate few hour coding tasks than few hour tasks in less verifiable domains. But for coding tasks that take humans months, it’s not clear that there’s a much better training signal for intermediate outputs than there is for tasks with a less verifiable end state. And if there aren’t easily verifiable intermediate outputs, it seems you face similar challenges to short horizon non-verifiable tasks in terms of getting a good training signal. Furthermore, the sorts of long horizon coding tasks humans do are often inherently vague and fuzzy as well, at a higher rather than shorter ones. It’s less clear how much of an issue this is for math, but for coding this consideration points me toward expecting automation of coding not that much before other fuzzier skills.
Seems like this should depend on how you form your current views on timelines/takeoff. The reason I put a bunch of stock in our forecasts for informing my personal views is that I think, while very flawed, they seem better than any previous piece of evidence or intuition I was including. But probably we just disagree on how to weigh different forms of evidence.
I had some more specific thoughts on ML-specific bottlenecks that might be difficult to get through in terms of software speed up but the main point is as you say, just apply a combo of amdahls, hofstadter and unknown unknowns and then this seems a bit more like a contractor’s bid on a public contract. (They’re always way over budget and always take 2x the amount of time compared to the plan.)
Nicely put!
As I think you’re aware, Epoch took a decent stab at this IMO here. I also spent a bunch of time thinking about all the sub-tasks involved in AI R&D early on in the scenario development. Tbh, I don’t feel like it was a great use of time compared to thinking at a higher level, but perhaps I was doing it poorly or am underestimating its usefulness.
Sorry for the confusion. Let me try a brief summary: N is the number of cumulative research effort doublings to go from SAR to SIAR, if r, the parameter controlling the number of doublings needed to get a fixed boost in research progress, was held constant.
I break down SIAR->ASI into 2 jumps:
The one from SIAR to an AI that is like an SIAR but with reference to the median and best researchers in the whole field rather than a single company. This might be a big jump because the median researcher in the whole field is much less capable than the median OpenBrain researcher. I estimate that this takes about 0.75N doublings.
The jump from being an ASI in AI research vs. an ASI in all tasks: I estimate that this takes 0.5N doublings.
Adding these gives me 1.25N doublings to get from SIAR to ASI. I then shade this up to a median of 1.5N doublings to account for r decreasing over time.
I’ll try to edit the supplement to make it more clear once I can explain it to you well.
I think you’re looking at the calendar time between now and superhuman coder, rather than the human-only software-only time between RE-Bench and superhuman coder? At least your numbers are quite similar to our overall bottom line which is the former.
I added up the median “Predictions for gap size” in the “How fast can the task difficulty gaps be crossed?” table, summing each set of predictions separately (“Eli”, “Nikola”, “FutureSearch”) to get three numbers ranging from 30-75.
Does this table cover the time between now and superhuman coder? I thought it started at RE-Bench, because:
I took all of this to be in context of the phrase, about one page back, “For each gap after RE-Bench saturation”
The earlier explanation that Method 2 is “a more complex model starting from a forecast saturation of an AI R&D benchmark (RE-Bench), and then how long it will take to go from that system to one that can handle real-world tasks at the best AGI company” [emphasis added]
The first entry in the table (“Time horizon: Achieving tasks that take humans lots of time”) sounds more difficult than saturating RE-Bench.
Earlier, there’s a separate discussion forecasting time to RE-bench saturation.
But sounds like I was misinterpreting?
Those estimates do start at RE-Bench, but these are all estimates for how long things would take given the “default” pace of progress, rather than the actual calendar time required. Adding them together ends up with a result that doesn’t take into account speedup from AI R&D automation or the slowdown in compute and algorithmic labor growth after 2028.
Sure – I was presenting these as “human-only, software-only” estimates:
So it doesn’t seem like there’s a problem here?
Ah right, my bad, I was confused. This is right except that these estimates aren’t software-only, they include recent levels of compute scaling.
Thanks, I’ve edited the post to note this.
I have a draft discussing this. (Facepalm, should publish more often...)
Certainly choosing better experiments requires at least one of:
large scaleup in experimental observations (to get the experience to drive taste acquisition)
superhuman sample efficiency in taste acquisition
extreme reasoning/deliberation on top of weak taste, adding up to greater taste (I think there are likely very diminishing returns to this, but superspeed might yield it)
I think your claim is betting on the first one, and also assuming that you can only get that by increasing throughput.
But maybe you could slurp enough up from existing research logs, or from interviews with existing researchers, or something like that. Then you’d be competing with the still overall larger, but more tacit and more distributed-between-brains research experience of all the humans in the org.
I polished and published the draft
Introducing exploration and experimentation
Why does exploration matter?
Research and taste
From play to experimentation
Exploration in AI, past and future
Research by AI: AI with research taste?
Opportunities
I agree with this.
I also think that there are some engineering/infrastructure challenges to executing training runs, that one would not necessarily cede to AI, not because it may not be desirable, but because it would involve a level of embodiment that is likely beyond the timeline proposed in the AI 2027 thesis. (I do agree with most of the thesis however).
I’m not sure there’s a research basis (that I could find at least, though I am very open to correction on this point), for embodiment of AI systems (robotic bodies) being able to keep pace with algorithmic improvement.
While an AI system could likely design a new model architecture, and training architecture, it comes down to very human supply chain and technician speed that enables that physical training to be run at the scales required.
Further, there are hardware challenges to large training runs of AI systems, which may not be resolvable by an AI system as readily, due to lack of exposure to those kinds of physical issues in their inherent reasoning space. (They have never opened a server during a training run, and resolved an overheat issue for instance).
Some oft overlooked items involved in training, are based on the fact that the labs tend to not own their own data centers but rather rely on cloud providers. This means they have to contend with:
Cluster allocation: Scheduling the time on thousands of GPUs across multiple cloud providers, and reserving time blocks, securing budget, etc. I can easily buy the concept of an AI system recursively self-improving on a baked in infrastructure, but the speed with which its human colleagues may be able to secure additional infrastructure for it may be challenging. I understand that in the article that the model has taken over the ‘day to day’ operations, but I’m not sure I characterize a significant training run as a ‘day to day’ activity. This scheduling goes beyond just ‘calling some colos’, and involves potentially running additional power and fiber to buildings, construction schedules, etc.
Topology: Someone has to physically lay out the training network used. This goes beyond the networking per se, but also involves actually moving hardware around, building in redundancies in the data hall (extra PDUs, etc.), running networking cable, and putting mitigations in place for transients, etc. This all requires technicians, parts, hardware, etc. Lead times in some cases for those parts exceed the timeline to ASI proposed.
Hardware/Firmware Validation: People physically have to check the server infrastructure, the hardware and the firmware, and ensure that all of the cards are up to date, etc. Moving at speed in AI, a lot of ‘second hand’, or ‘relocated’ servers and infrastructure tend to be used. It is not a small task to catalogue all of that and place it into a DCIM framework.
Stress Testing: Running large power loads to check thermal limits, power draw, and inter-GPU comms. Parts fail here routinely, requiring replacement, etc.
Power: Assuming that in the proposed timeline, compute remains linked to power, we are looking at a generational data center capability issue.
The data centers under construction now, set to go on-line in the 2029/2030 timeline, will be the first to use Blackwell GPUs at scale.
This then implies that to achieve the 2027 timeline, we’ll be able to stretch Hoppers and existing power infrastructure to the point that these improvements emerge out of existing physical hardware.
I do tend to agree that if we were unconstrained by power, and physical infrastructure, that algorithmically there is no reason at all to believe that we could not achieve ASI by 2027 - however the infrastructure challenges are absolutely enormous.
Land with sufficient water, power (including natural gas lines for onsite power) isn’t available. Utilities in the US are currently restricting power access to data centers, by leveraging significant power tariffs and long term take or pay commitments. (The AEP decision in Ohio for instance). This makes life harder for colocation providers in terms of financing and siting large infrastructure.
I believe that an AI system could well be match for the data cleaning and validation, and even the launch and orchestration using Slurm, Kubernetes or similar, but the initial launch phase is also something that I think will be slowed by the need for human hands.
This phase results in:
Out of memory errors on GPUs, which can often only be resolved by ‘turning a wrench’ on the server.
Unexpected Hardware failures (GPUs breaking, NVLinks breaking, Network timeouts, cabling issues, fiber optic degradation, power transcients, etc.) All of these require human technicians.
These errors are also insidious, because the software running the training can’t tell the impact of these failures on which parts of the network is being trained, and which isn’t. This would make it challenging for an AI director to really understand what was causing issues in the desired training outcome. This makes it unlikely that a runaway situation would take place where a model is just recursively self-improving on a rapid timeline without human input, unless it first cracked the design, and mass manufacture of embodied AI workers that could move and act as quickly as it can.
A good case study on this is the woes faced by OpenAI in training GPT-4.5, where all of this came to a head, taking a training run scheduled for a month or two, and stretching it over a year. OpenAI spoke very openly about this in a Youtube video they released.
What’s more, at scale, if we are going to be relying on existing data centers for a model of this sophistication, we’d have the model split across multiple clusters, potentially in multiple locations. This causes latency issues, etc.
That’s the part that to me is missing from the near-term timeline. I think the thesis around zones created just to build power and data centers, following ASI, seems very credible, especially with that level of infiltration of government/infrastructure.
I don’t however see a way of getting to a model capable of ASI with current data center infrastructure, prior to the largest new campuses coming online, and power running to Blackwell GPUs.
Here you’re using “short timelines” to refer to our takeoff model I think, which is what you spend most of the post discussing? Seems a bit confusing if so, and you also do this in a few other places.
Correct. Am I wrong in thinking that it’s usual to use the word “timelines” to refer to the entire arc of AI progress, including both the periods covered in the “Timelines Forecast” and “Takeoff Forecast”? But, since this is all in the context of AI 2027 I should have clarified.
I think that usually in AI safety lingo people use timelines to mean time to AGI and takeoff to mean something like the speed of progression after AGI.