I agree it seems unlikely that we’ll see coordination on slowing down before one actor or coalition has a substantial enough lead over other actors that it can enforce such a slowdown unilaterally, but I think it’s reasonably likely that such a lead will arise before things get really insane.
A few different stories under which one might go from aligned “genius in a datacenter” level AI at time t to outcomes merely at the level of weirdness in this essay at t + 5-10y:
The techniques that work to align “genius in a datacenter” level AI don’t scale to wildly superhuman intelligence (eg because they lose some value fidelity from human-generated oversight signals that’s tolerable at one remove but very risky at ten). The alignment problem for serious ASI is quite hard to solve at the mildly superintelligent level, and it genuinely takes a while to work out enough that we can scale up (since the existing AIs, being aligned, won’t design unaligned successors).
If people ask their only-somewhat-superhuman AI what to do next, the AIs say “A bunch of the decisions from this point on hinge on pretty subtle philosophical questions, and frankly it doesn’t seem like you guys have figured all this out super well, have you heard of this thing called a long reflection?” That’s what I’d say if I were a million copies of me in a datacenter advising a 2024-era US government on what to do about Dyson swarms!
A leading actor uses their AI to ensure continued strategic dominance and prevent competing AI projects from posing a meaningful threat. Having done so, they just… don’t really want crazy things to happen really fast, because the actor in question is mostly composed of random politicians or whatever. (I’m personally sympathetic to astronomical waste arguments, but it’s not clear to me that people likely to end up with the levers of power here are.)
The serial iteration times and experimentation loops are just kinda slow and annoying, and mildly-superhuman AI isn’t enough to circumvent experimentation time bottlenecks (some of which end up being relatively slow), and there are stupid zoning restrictions on the land you want to use for datacenters, and some regulation adds lots of mandatory human overhead to some critical iteration loop, etc.
This isn’t a claim that maximal-intelligence-per-cubic-meter ASI initialized in one datacenter would face long delays in making efficient use of its lightcone, just that it might be tough for a not-that-much-better-than-human AGI that’s aligned and trying to respect existing regulations and so on to scale itself all that rapidly.
Among the tech unlocked in relatively early-stage AGI is better coordination, and that helps Earth get out of unsavory race dynamics and decide to slow down.
The alignment tax at the superhuman level is pretty steep, and doing self-improvement while preserving alignment goes much slower than unrestricted self-improvement would; since at this point we have many fewer ongoing moral catastrophes (eg everyone who wants to be cryopreserved is, we’ve transitioned to excellent cheap lab-grown meat), there’s little cost to proceeding very cautiously.
This is sort of a continuous version of the first bullet point with a finite rather than infinite alignment tax.
All that said, upon reflection I think I was probably lowballing the odds of crazy stuff on the 10y timescale, and I’d go to more like 50-60% that we’re seeing mind uploads and Kardashev level 1.5-2 civilizations etc. a decade out from the first powerful AIs.
I do think it’s fair to call out the essay for not highlighting the ways in which it might be lowballing things or rolling in an assumption of deliberate slowdown; I’d rather it have given more of a nod to these considerations and made the conditions of its prediction clearer.
I agree it seems unlikely that we’ll see coordination on slowing down before one actor or coalition has a substantial enough lead over other actors that it can enforce such a slowdown unilaterally, but I think it’s reasonably likely that such a lead will arise before things get really insane.
A few different stories under which one might go from aligned “genius in a datacenter” level AI at time t to outcomes merely at the level of weirdness in this essay at t + 5-10y:
The techniques that work to align “genius in a datacenter” level AI don’t scale to wildly superhuman intelligence (eg because they lose some value fidelity from human-generated oversight signals that’s tolerable at one remove but very risky at ten). The alignment problem for serious ASI is quite hard to solve at the mildly superintelligent level, and it genuinely takes a while to work out enough that we can scale up (since the existing AIs, being aligned, won’t design unaligned successors).
If people ask their only-somewhat-superhuman AI what to do next, the AIs say “A bunch of the decisions from this point on hinge on pretty subtle philosophical questions, and frankly it doesn’t seem like you guys have figured all this out super well, have you heard of this thing called a long reflection?” That’s what I’d say if I were a million copies of me in a datacenter advising a 2024-era US government on what to do about Dyson swarms!
A leading actor uses their AI to ensure continued strategic dominance and prevent competing AI projects from posing a meaningful threat. Having done so, they just… don’t really want crazy things to happen really fast, because the actor in question is mostly composed of random politicians or whatever. (I’m personally sympathetic to astronomical waste arguments, but it’s not clear to me that people likely to end up with the levers of power here are.)
The serial iteration times and experimentation loops are just kinda slow and annoying, and mildly-superhuman AI isn’t enough to circumvent experimentation time bottlenecks (some of which end up being relatively slow), and there are stupid zoning restrictions on the land you want to use for datacenters, and some regulation adds lots of mandatory human overhead to some critical iteration loop, etc.
This isn’t a claim that maximal-intelligence-per-cubic-meter ASI initialized in one datacenter would face long delays in making efficient use of its lightcone, just that it might be tough for a not-that-much-better-than-human AGI that’s aligned and trying to respect existing regulations and so on to scale itself all that rapidly.
Among the tech unlocked in relatively early-stage AGI is better coordination, and that helps Earth get out of unsavory race dynamics and decide to slow down.
The alignment tax at the superhuman level is pretty steep, and doing self-improvement while preserving alignment goes much slower than unrestricted self-improvement would; since at this point we have many fewer ongoing moral catastrophes (eg everyone who wants to be cryopreserved is, we’ve transitioned to excellent cheap lab-grown meat), there’s little cost to proceeding very cautiously.
This is sort of a continuous version of the first bullet point with a finite rather than infinite alignment tax.
All that said, upon reflection I think I was probably lowballing the odds of crazy stuff on the 10y timescale, and I’d go to more like 50-60% that we’re seeing mind uploads and Kardashev level 1.5-2 civilizations etc. a decade out from the first powerful AIs.
I do think it’s fair to call out the essay for not highlighting the ways in which it might be lowballing things or rolling in an assumption of deliberate slowdown; I’d rather it have given more of a nod to these considerations and made the conditions of its prediction clearer.