Against “argument from overhang risk”
Epistemic status: I wrote this in August 2023, got some feedback I didn’t manage to incorporate very well, and then never published it. There’s been less discussion of overhang risk recently but I don’t see any reason to keep sitting on it. Still broadly endorsed, though there’s a mention of a “recent” hardware shortage which might be a bit dated.
I think arguments about the risks of overhangs are often unclear about what type of argument is being made. Various types arguments that I’ve seen include:
Pausing is net-harmful in expectation because it would cause an overhang, which [insert further argument here]
Pausing is less helpful than [naive estimate of helpfulness] because it would cause an overhang, which [insert further argument here]
We shouldn’t spend effort attempting to coordinate or enforce a pause, because it will cause [harmful PR effects]
We shouldn’t spend effort attempting to coordinate or enforce a pause, because that effort can be directed toward [different goal], which is more helpful (conditional on proponent’s implicit model of the world)
I think it’s pretty difficult to reason about e.g. (4) without having good object-level models of the actual trade-offs involved for each proposed intervention, and so have been thinking about (1) and (2) recently. (I haven’t spent much time thinking about (3); while I’ve seen some arguments in that direction, most have been either obvious or trivially wrong, rather than wrong in a non-obvious way.)
I think (1) generally factors into two claims:
Most alignment progress will happen from studying closer-to-superhuman models
The “rebound” from overhangs after a pause will happen so much faster than the continuous progress that would’ve happened without a pause, that you end up behind on net in terms of alignment progress at the same level of capabilities
I want put aside the first claim and argue that the second is not obvious.
Heuristic
Arguments that overhangs are so bad that they outweigh the effects of pausing or slowing down are basically arguing that a second-order effect is more salient than the first-order effect. This is sometimes true, but before you’ve screened this consideration off by examining the object-level, I think your prior should be against.
Progress Begets Progress
I think an argument of the following form is not crazy: “All else equal, you might rather smoothly progress up the curve of capabilities from timestep 0 to timestep 10, gaining one capability point per timestep, than put a lid on progress at 0 and then rapidly gain 10 capability points between timesteps 9 to 10, because that gives you more time to work on (and with) more powerful models”.
But notice that this argument has several assumptions, both explicit and implicit:
You know that dangerous capabilities emerge when you have 10 capability points, rather than 9, or 7, or 2.
You would, in fact, gain 10 capability points in the last timestep, if you stopped larger training runs, rather than a smaller number.
You will learn substantially more valuable things from more powerful models. From another frame: you will learn anything at all that generalizes from the “not dangerous” capabilities regime to “dangerous” capabilities regime.
The last assumption is the most obvious, and is the source of significant disagreement between various parties. But I don’t think the claim that overhangs make pausing more-dangerous-than-not survives, even if you accept that assumption for the sake of argument.
First, a pause straightforwardly buys you time in many worlds where counterfactual (no-pause) timelines were shorter than the duration of the pause. The less likely you think it is that we reach ASI in the next n years, the less upside there is to an n-year pause.
Second, all else is not equal, and it seems pretty obvious that if we paused large training runs for some period of time, and then unpaused them, we should not (in expectation) see capabilities quickly and fully “catch up” to where they’d be in the counterfactual no-pause world. What are the relevant inputs for pushing the capabilities frontier?
Hardware (GPUs/etc)
Algorithms
Supporting infrastructure (everything from “better tooling for scaling large ML training runs” to “having a sufficient concentration of very smart people in one place, working on the same problem”)
Approximately all relevant inputs into pushing the capabilities frontier would see less marginal demand if we instituted a pause. Many of them seem likely to have serial dependency trees in practice, such that large parts of projected overhangs may take nearly as long to traverse after a pause as they would have without a pause.
If we institute a pause, we should expect to see (counterfactually) reduced R&D investment in improving hardware capabilities, reduced investment in scaling hardware production, reduced hardware production, reduced investment in research, reduced investment in supporting infrastructure, and fewer people entering the field.
These are all bottlenecks. If it were the case that a pause only caused slowdown by suppressing a single input (i.e. hardware production), while everything else continued at the same speed, then I’d be much less surprised to see a sharp spike in capabilities after the end of a pause (though this depends substantially on which input is suppressed). To me, the recent hardware shortage is very strong evidence that we will not be surprised by a sharp jump in capabilities after a pause, as a result of the pause creating an overhang that eliminates all or nearly all bottlenecks to reaching ASI.
Also relevant is the actual amount of time you expect it to take to “eat” an overhang. Some researchers seem to be operating on a model where most of our progress on alignment will happen in a relatively short window of time before we reach ASI—maybe a few years at best. While I don’t think this is obviously true, it is much more likely to be true if you believe that we will “learn substantially more valuable things from more powerful models”[1].
If you believe that things will move quickly at the end, then an overhang seems like it’s most harmful if it takes substantially less time to eat the overhang than the counterfactual “slow takeoff” period. If there’s a floor on how long it takes to eat the overhang, and it’s comparable to or longer than the counterfactual “slow takeoff” period, then you don’t lose any of the precious wall clock time working with progressively more powerful models you need to successfully scale alignment. But given you already believe that the counterfactual “slow takeoff” period is not actually that slow (maybe on the order of a couple years), you need to think that the overhang will get eaten very quickly (on the order of months, or maybe a year on the outside) for you to be losing much time. As argued above, I don’t think we’re very likely to be able to eat any sort of meaningful overhang that quickly.
I haven’t spent much time thinking about situations where things move relatively slowly at the end, but I have a few main guesses for what world models might generate that belief:
Sharply diminishing returns to intelligence
No meaningful discontinuities in “output” from increased intelligence
Incremental alignment of increasingly powerful models allows us to very strongly manage (and slow down) our climb up the capabilities curve
(1) seems implausible to me, but also suggests that risk is not that high in an absolute sense. If you believe this, I’m not sure why you’re particularly concerned about x-risk from AI.
(2) seems contradicted by basically all the empirical evidence we have available to us on the wildly discontinuous returns to intelligence even within the relatively narrow human span of intelligence; I don’t really know how to bridge this gap.
(3) doesn’t seem like it’d work unless you’re imagining dramatically more global governance/coordination success than I am (and also makes a lot of assumptions about future alignment successes).
Putting those objections aside, (1) doesn’t seem like it should be particularly concerned with overhang risk (though it doesn’t see much benefit to pausing either). (2) and (3) do seem like they might be concerned by overhang risk; (3) depends strongly on careful management of capabilities growth.
Conclusion
To sum up, my understanding of the arguments against pausing suggests that they depend on an upstream belief that having enough “well-managed” wall clock time with progressively more powerful models is an important or necessary factor in succeeding at aligning ASI at the end of the day.
I argue that overhang is unlikely to be eaten so quickly that you lose time compared to how much time you’d otherwise have during a slow takeoff. I assert that a “very very slow / no take-off” world is implausible, without argument.
I may be misunderstanding or missing the best arguments for why overhangs should be first-order considerations when evaluating pause proposals. If I have, please do leave a comment.
Thanks to Drake Thomas for the substantial feedback.
- ^
Remember that this belief suggests correspondingly more pessimistic views about the value of pausing. There may be arguments against pausing that don’t rely on this, but in practice, I observe that those concerned by AI x-risk who think that pausing is harmful or not very helpful on net tend to arrive at that belief because it’s strongly entailed by believing that we’d be bottlenecked on alignment progress without being able to work with progressively more powerful models.
I don’t follow the reasoning here. Shouldn’t a hardware shortage be evidence we will see a spike after a pause?
For example, suppose we pause now for 3 years and during that time NVIDIA releases the RTX5090,6090,7090 which are produced using TSMC’s 3nm, 2nm and 10a processes. Then the amount of compute available at the end of the three year pause will be dramatically higher than it is today. (for reference, the 4090 is 4x better at inference than the 3090). Roughly speaking, then, after your 3 year pause a billion dollar investment will buy 64x as much compute (this is more than the difference between GPT-4 and GPT-3).
Also, a “pause” would most likely only be a cap on the largest training runs. It is unlikely that we’re going to pause all research on current LLM capabilities. Consider that a large part of the “algorithmic progress” in LLM inference speed is driven not by SOTA models, but by hobbyists trying to get LLMs to run faster on their own devices.
This means that in addition to the 64x hardware improvement, we would also get algorithmic improvement (which has historically faster than hardware improvement).
That means at the end of a 3 year pause, an equal cost run would be not 64x but 4096x larger.
Finally, LLMs have already reached the point where they can be reasonably expected to speed up economic growth. Given their economic value will become more obvious over time,the longer we pause, the more we can expect that the largest actors will be willing to spend on a single run. It’s hard to put an estimate on this, but consider that historically the largest runs have been increasing at 3x/year. Even if we conservatively estimate 2x per year, that gives us an additional 8x at the end of our 3 year pause. This now gives us a factor of 32k at the end of our 3 year pause.
Even if you don’t buy that “Most alignment progress will happen from studying closer-to-superhuman models”, surely you believe that “large discontinuous changes are risky” and a factor of 32,000x is a “large discontinuous change”.
We ran into a hardware shortage during a period of time where there was no pause, which is evidence that the hardware manufacturer was behaving conservatively. If they’re behaving conservatively during a boom period like this, it’s not crazy to think they might be even more conservative in terms of novel R&D investment & ramping up manufacturing capacity if they suddenly saw dramatically reduced demand from their largest customers.
This and the rest of your comment seems to have ignored the rest of my post (see: multiple inputs to progress, all of which seem sensitive to “demand” from e.g. AGI labs), so I’m not sure how to respond. Do you think NVIDIA’s planning is totally decoupled from anticipated demand for their products? That seems kind of crazy, but that’s the scenario you seem to be describing. Big labs are just going to continue to increase their willingness-to-spend along a smooth exponential for as a long as the pause lasts? What if the pause lasts 10 years?
If you think my model of how inputs to capabilities progress are sensitive to demand for those inputs from AGI labs is wrong, then please argue so directly, or explain how your proposed scenario is compatible with it.
Alternative hypothesis, there are physical limits on how fast you can build things.
Also, NVIDIA currently has a monopoly on “decent AI accelerator you can actually buy”. Part of the “shortage” is just the standard economic result that a monopoly produces less of something to increase profits.
This monopoly will not last forever, so in that sense we are currently in hardware “underhang”.
Nvidia doesn’t just make AGI accelerators. They are are video game graphics card company.
And even if we pause large training runs, demand for inference of existing models will continue to increase.
This is me arguing directly.
The model “all demand for hardware is driven by a handful of labs training cutting edge models” is completely implausible. It doesn’t explain how we got the hardware in the first place (video games) and it ignores the fact that there exist uses for AI acceleration hardware other than training cutting-edge models.
Only if you pause everything that could bring ASI. That is hardware, training runs, basic science on learning algorithms, brain studies etc.
This seems non-reponsive to arguments already in my post:
If you are referring to this:
This seems an extreme claim to me (if these effects are argued to be meaningful), especially “fewer people entering the field”! Just how long do you think you would need a pause to make fewer people enter the field? I would expect that not only would the pause have to have lasted say 5+ years but there would have to be a worldwide expectation that it would go on for longer to actually put people off.
Because of flow on effects and existing commitments, reduced hardware R&D investment wouldn’t start for a few years either. Its not clear that it will meaningfully happen at all if we want to deploy existing LLM everywhere also. For example in robotics I expect there will be substantial demand for hardware even without AI advances as our current capabilities havn’t been deployed there yet.
As I have said here, and probably in other places, I am quite a bit more in favor of directly going for a hardware pause specifically for the most advanced hardware. I think it is achievable, impactful, and with clearer positive consequences (and not unintended negative ones) than targeting training runs of an architecture that already seems to be showing diminishing returns.
If you must go for after FLOPS for training, then build in large factors of safety for architectures/systems that are substantially different from what is currently done. I am not worried about unlimited FLOPS on GPT-X but could be for >100* less on something that clearly looks like it has very different scaling laws.
The other issue with pausing training of large foundation models (the type of pause we might actually achieve) is that it could change the direction of AGI research to a less-alignable type of first AGIs.
This is related to but different from the direction argument made by Russel Thor in that comment thread is. You don’t have to think that LLMs are an inefficient path to AGI to worry about changing the direction. LLMs might be an equally or more efficient path that happens to be safer. Indeed, I think it is. The instructability and translucency of LLM cognition makes them an ideal base model out of which to build “real” agentic, self-improving AGI that can be aligned to human intentions and whose cognitive processes can be understood and monitored relatively well.
On the other hand, pausing new training runs might redirect a lot of effort into fleshing out current foundation models into more useful cognitive architectures. These systems seem like our best chance of alignment. Progress on foundation models might remove the fairly tight connection between LLMs cognition and the language they emit. That would make them less “translucent” and thereby less alignable. So that’s a reason to favor a pause.
Neither of those arguments are commonly raised; I have yet to write a post about it.
Another perspective.
If you believe like me that it is >90% that the current LLM approach is plateauing then your cost/benefit for pausing large training runs is different. I believe that the current AI lacks something like the generalization power of the human brain, this can be seen where Tesla auto-pilot has needed >10,000* the training data as a person and is still not human level. This could potentially be overcome by a better architecture, or could require different hardware as well because of the Von Neumann Bottleneck. If this is the case then a pause on large training runs can hardly be helpful. I believe that if LLM are not X-risk, then their capabilities should be fully explored and integrated fast into society to provide defense against more dangerous AI. It is a radically improved architecture or hardware that you should be worried about.
Three potential sources of danger
Greatly improved architecture
Large training run with current arch
Greatly improved HW
We are paying more attention to (2) when to me it is the least impactful of the three and could even hurt. There are obvious ways this can hurt the cause.
If such training runs are not dangerous then the AI safety group loses credibility.
It could give a false sense of security when a different arch requiring much less training appears and is much more dangerous than the largest LLM.
It removes the chance to learn alignment and safety details from such large LLM
A clear path to such a better arch is studying neurons. Whether this is Dishbrain, through progress in neural interfaces, brain scanning or something else, I believe it is very likely by 2030 we will have understood the brain/neural algorithm, characterized it pretty well and of course have the ability to attempt to implement it in our hardware.
So in terms of pauses, I think one targeted towards chip factories is better. It is achievable and it is clear to me that if you delay a large factory opening by 5 years, then you can’t make up the lost time in anything like the same way for software.
Stopping (1) seems impossible i.e. “Don’t study the human brain” seems likely to backfire. We would of course like some agreement that if a much better arch is discovered, it isn’t immediately implemented.
This seems to be arguing that the big labs are doing some obviously-inefficient R&D in terms of advancing capabilities, and that government intervention risks accidentally redirecting them towards much more effective R&D directions. I am skeptical.
I’m not here for credibility. (Also, this seems like it only happens, if it happens, after the pause ends. Seems fine.)
I’m generally unconvinced by arguments of the form “don’t do [otherwise good thing x]; it might cause people to let their guard down and get hurt by [bad thing y]” that don’t explain why they aren’t a fully-general counterargument.
If you think LLMs are hitting a wall and aren’t likely to ever lead to dangerous capabilities then I don’t know why you expect to learn anything particularly useful from the much larger LLMs that we don’t have yet, but not from those we do have now.
In terms of the big labs being inefficient, with hindsight perhaps. Anyway I have said that I can’t understand why they aren’t putting much more effort into Dishbrain etc. If I had ~$1B and wanted to get ahead on a 5 year timescale I would give it more probability expectation etc.
For
I am here for credibility. I am sufficiently highly confident they are not X-risk to not want to recommend stopping. I want the field to have credibility for later.
Yes, but I don’t think stopping the training runs is much of an otherwise good thing if at all. To me it seems more like inviting a fire safety expert and they recommend a smoke alarm in your toilet but not kitchen. If we can learn alignment stuff from such training runs, then stopping is an otherwise bad thing.
OK I’m not up with the details but some experts sure think we learnt a lot from 3.5/4.0. Also my belief about it often being a good idea to deploy the most advanced non X-risk AI as defense. (This is somewhat unclear, usually what doesn’t kill makes stronger, but I am concerned about AI companion/romantic partner etc. That could weaken society in a way to make it more likely to make bad decisions later. But that seems to have already happened and very large models being centralized could be secured against more capable/damaging versions.)
You make some excellent points.
I think the big crux here is what sort of pause we implement. If we thoroughly paused AI research, I think your logic goes through; only the hardware improves meaningfully. But that’s pretty unrealistic.
If it’s only a pause in training the largest models, and all other research continues unabated, then I think we get 90% of the problem with overhang. Algorithmic efficiency and hardware availability have increased almost as much as they would otherwise, as Logan argues. There’s a bit less research into AI, but almost the same amount since current LLMs are already economically valuable.
(There’s even a curious increased reason to invest in AI research: the effective lead of OAI, GDM and Anthropic is lessened; new companies can catch up to their current knowledge to race them when the pause is lifted, and to pursue other directions that aren’t paused).
I very much agree that demand for new hardware from training frontier models will not change dramatically with a pause. The limited supply chain and expertise is the limiting factor in hardware improvements, with demand changes playing a smaller role (not insignificant, just smaller).
Since a limited pause on training large models is the only one we can even really hope to get, I’d have to say that the overhang argument for pause remains pretty strong.