The whole reason I think we should pause is that, sooner or later, we will hit a threshold where, if we do not pause, we literally die, basically immediately (or, like, a couple months later in a way that is hard to find and intervene on) and it doesn’t matter whether pausing has downsides.
(Where “pause” means “pause further capability developments that are more likely to produce the kinds of agentic thinking that could result in a recursive self-improvement)
So for me the question is “do we literally pause at the last possible second, or, sometime beforehand.” (I doubt “last possible second” is a reasonable choice, if it’s necessary to pause “at all”, although I’m much more agnostic about “we should literally pause right now” vs “right at the moment we have minimum-viable-AGI but before recursive self-improvement can start happening” vs “2-6 months before MVP AGI”)
I’m guessing you probably disagree that there will be a moment where, if we do not pause, we literally die?
Short answer, yes I disagree. I also don’t think we are safe if we do pause. That additional fact makes pausing seem less optimal.
Long answer:
I think I need to break this concept down a bit more. There’s a variety of places one might consider pausing, and a variety of consequences which I think could happen at each of those places.
Pre-AGI, dangerous tool AI: this is where we are at now. A combination of tool AI and the limited LLMs we have can together provide pretty substantial uplift to a terrorist org attempting to wipe out humanity. Not an x-risk, but could certainly kill 99% of humanity. A sensible civilization would have paused before we got this far, and put appropriate defenses in place before proceeding. We still have a chance to build defenses, and should do that ASAP. Civilization is not safe. Our existing institutions are apparently failing to protect us from the substantial risks we are facing.
Assistant AGI
weak AGI is being trained. With sufficient scaffolding, this will probably speed up AI research substantially, but isn’t enough to fully automate the whole research process. This can probably be deployed safely behind an API without dooming us all. If it is following the pattern of LLMs so far, this involves a system with tons of memorized facts but subpar reasoning skills and lack of integration of the logical implications of combinations of the facts. The facts are scattered and disconnected.
If we were wise, we’d stop here and accept this substantial speed-up to our research capabilities and try to get civilization into a safe state. If others are racing though, the group at this level is going to be strongly tempted to proceed. This is not yet enough AI power to assure a decisive technological-economic victory for the leader.
If this AGI escaped, we could probably catch it before it self-improved much at all, and it would likely pose no serious (additional) danger to humanity.
Full Researcher AGI
Enough reasoning capability added to the existing LLMs and scaffolding systems (perhaps via some non-LLM architecture) that the system can now reason and integrate facts at least as well as a typical STEM scientist.
I don’t expect that this kills us or escapes its lab, if it is treated cautiously. I think Buck&Ryan’s control scheme works well to harness capabilities without allowing harms, with only a moderate safety tax.
The speed up to AI research could now be said to truly be RSI. The time to the next level up, the more powerful version, may be only a few months, if the research is gone ahead with at full speed. This is where we would depend on organizational adequacy to protect us, to restrain the temptation of some researchers to accelerate full speed. This is where we have the chance to instead turn the focus fully onto alignment, defensive technology acceleration, and human-augmentation. This probably enough AI power to grant the leader decisive economic/technological/military power and allow them to take action to globally halt further racing towards AI. In other words, this is the first point at which I believe we could safely ‘pause’, although given that the ‘pause’ would look like pretty drastic actions and substantial use of human level AI, I don’t think that the ‘pause’ framing quite fits. Civilization can quickly be made safe.
If released into the world, could probably evolve into something that could wipe out humanity. Could be quite hard to catch and delete it in time, if it couldn’t be proven to the relevant authorities that the risk was real and required drastic action. Not a definite game-over though, just high risk.
Mildly Superhuman AGI (weak ASI)
Not only knows more about the world than any human ever has, but also integrates this information and reasons about it more effectively than any human could. Pretty dangerous even in the lab, but still could be safely controlled via careful schemes. For instance, keeping it impaired with slow-downs, deliberately censored and misleading datasets/simulations, and noise injection into its activations. Deploying it safely would likely require so much impairment that it wouldn’t end up any more useful than the merely-high-human-level AGI.
There would be huge temptation to relax the deliberate impairments and harness the full power. Organizational insufficiency could strike here, failing to sufficiently restrain the operators.
If it escaped, it would definitely be able to destroy humanity within a fairly short time frame.
Strongly Superhuman
By the time you’ve relaxed your impairment and control measures enough to even measure how superhuman it is, you are already in serious danger from it. This is the level where you would need to seriously worry about the lab employees getting mind-hacked in some way, or the lab compute equipment getting hacked. If we train this model, and even try to test it at full power, we are in great danger.
If it escapes, it is game over.
Given these levels, I think that you are technically correct that there is a point where if we don’t pause we are pretty much doomed. But I think that that pause point is somewhere above human-level AGI.
Furthermore, I think that attempting to pause before we get to ‘enough AI power to make humanity safe’ leads to us actually putting humanity at greater risk.
How?
By continuing in our current state of great vulnerability to self-replicating weapons like bioweapons.
By putting research pressure on the pursuit of more efficient algorithms, and resulting in a much more distributed, harder to control progress front proceeding nearly as fast to superhuman AGI. I think this rerouting would put us in far more danger of having a containment breach, or overshooting the target level of useful-but-not-suicidally-dangerous.
I don’t have a deep confident argument that we should pause before ‘slightly superhuman level’, but I do think we will need to pause then, and I think getting humanity ready to pause takes like 3 years which is also what my earliesh possible timelines are, so I think we need to start laying the groundwork now so that, even if you think we shouldn’t pause till later, we are ready civilizationally to pause quite abruptly.
Well, it seems we are closer to agreement than we thought about when a pause could be good. I am unsure about the correct way to prep for the pause. I do think we are about 2-4 years from human-level AI, and could get to above-human-level within a year after that if we so chose.
It’s more like an intuitive guess than anything based on anything particularly rigorous, but, like, it takes time for companies and nation-states and international communities to get to agree to things, we don’t seem anywhere close, there will be political forces opposing the pause, and 3 years seems like a generously short time if we even got moderately lucky, to get all the necessary actors to pause in a stable way.
The whole reason I think we should pause is that, sooner or later, we will hit a threshold where, if we do not pause, we literally die, basically immediately (or, like, a couple months later in a way that is hard to find and intervene on) and it doesn’t matter whether pausing has downsides.
(Where “pause” means “pause further capability developments that are more likely to produce the kinds of agentic thinking that could result in a recursive self-improvement)
So for me the question is “do we literally pause at the last possible second, or, sometime beforehand.” (I doubt “last possible second” is a reasonable choice, if it’s necessary to pause “at all”, although I’m much more agnostic about “we should literally pause right now” vs “right at the moment we have minimum-viable-AGI but before recursive self-improvement can start happening” vs “2-6 months before MVP AGI”)
I’m guessing you probably disagree that there will be a moment where, if we do not pause, we literally die?
Short answer, yes I disagree. I also don’t think we are safe if we do pause. That additional fact makes pausing seem less optimal.
Long answer:
I think I need to break this concept down a bit more. There’s a variety of places one might consider pausing, and a variety of consequences which I think could happen at each of those places.
Pre-AGI, dangerous tool AI: this is where we are at now. A combination of tool AI and the limited LLMs we have can together provide pretty substantial uplift to a terrorist org attempting to wipe out humanity. Not an x-risk, but could certainly kill 99% of humanity. A sensible civilization would have paused before we got this far, and put appropriate defenses in place before proceeding. We still have a chance to build defenses, and should do that ASAP. Civilization is not safe. Our existing institutions are apparently failing to protect us from the substantial risks we are facing.
Assistant AGI
weak AGI is being trained. With sufficient scaffolding, this will probably speed up AI research substantially, but isn’t enough to fully automate the whole research process. This can probably be deployed safely behind an API without dooming us all. If it is following the pattern of LLMs so far, this involves a system with tons of memorized facts but subpar reasoning skills and lack of integration of the logical implications of combinations of the facts. The facts are scattered and disconnected.
If we were wise, we’d stop here and accept this substantial speed-up to our research capabilities and try to get civilization into a safe state. If others are racing though, the group at this level is going to be strongly tempted to proceed. This is not yet enough AI power to assure a decisive technological-economic victory for the leader.
If this AGI escaped, we could probably catch it before it self-improved much at all, and it would likely pose no serious (additional) danger to humanity.
Full Researcher AGI
Enough reasoning capability added to the existing LLMs and scaffolding systems (perhaps via some non-LLM architecture) that the system can now reason and integrate facts at least as well as a typical STEM scientist.
I don’t expect that this kills us or escapes its lab, if it is treated cautiously. I think Buck&Ryan’s control scheme works well to harness capabilities without allowing harms, with only a moderate safety tax.
The speed up to AI research could now be said to truly be RSI. The time to the next level up, the more powerful version, may be only a few months, if the research is gone ahead with at full speed. This is where we would depend on organizational adequacy to protect us, to restrain the temptation of some researchers to accelerate full speed. This is where we have the chance to instead turn the focus fully onto alignment, defensive technology acceleration, and human-augmentation. This probably enough AI power to grant the leader decisive economic/technological/military power and allow them to take action to globally halt further racing towards AI. In other words, this is the first point at which I believe we could safely ‘pause’, although given that the ‘pause’ would look like pretty drastic actions and substantial use of human level AI, I don’t think that the ‘pause’ framing quite fits. Civilization can quickly be made safe.
If released into the world, could probably evolve into something that could wipe out humanity. Could be quite hard to catch and delete it in time, if it couldn’t be proven to the relevant authorities that the risk was real and required drastic action. Not a definite game-over though, just high risk.
Mildly Superhuman AGI (weak ASI)
Not only knows more about the world than any human ever has, but also integrates this information and reasons about it more effectively than any human could. Pretty dangerous even in the lab, but still could be safely controlled via careful schemes. For instance, keeping it impaired with slow-downs, deliberately censored and misleading datasets/simulations, and noise injection into its activations. Deploying it safely would likely require so much impairment that it wouldn’t end up any more useful than the merely-high-human-level AGI.
There would be huge temptation to relax the deliberate impairments and harness the full power. Organizational insufficiency could strike here, failing to sufficiently restrain the operators.
If it escaped, it would definitely be able to destroy humanity within a fairly short time frame.
Strongly Superhuman
By the time you’ve relaxed your impairment and control measures enough to even measure how superhuman it is, you are already in serious danger from it. This is the level where you would need to seriously worry about the lab employees getting mind-hacked in some way, or the lab compute equipment getting hacked. If we train this model, and even try to test it at full power, we are in great danger.
If it escapes, it is game over.
Given these levels, I think that you are technically correct that there is a point where if we don’t pause we are pretty much doomed. But I think that that pause point is somewhere above human-level AGI.
Furthermore, I think that attempting to pause before we get to ‘enough AI power to make humanity safe’ leads to us actually putting humanity at greater risk.
How?
By continuing in our current state of great vulnerability to self-replicating weapons like bioweapons.
By putting research pressure on the pursuit of more efficient algorithms, and resulting in a much more distributed, harder to control progress front proceeding nearly as fast to superhuman AGI. I think this rerouting would put us in far more danger of having a containment breach, or overshooting the target level of useful-but-not-suicidally-dangerous.
I don’t have a deep confident argument that we should pause before ‘slightly superhuman level’, but I do think we will need to pause then, and I think getting humanity ready to pause takes like 3 years which is also what my earliesh possible timelines are, so I think we need to start laying the groundwork now so that, even if you think we shouldn’t pause till later, we are ready civilizationally to pause quite abruptly.
Well, it seems we are closer to agreement than we thought about when a pause could be good. I am unsure about the correct way to prep for the pause. I do think we are about 2-4 years from human-level AI, and could get to above-human-level within a year after that if we so chose.
What makes you say 3 years?
It’s more like an intuitive guess than anything based on anything particularly rigorous, but, like, it takes time for companies and nation-states and international communities to get to agree to things, we don’t seem anywhere close, there will be political forces opposing the pause, and 3 years seems like a generously short time if we even got moderately lucky, to get all the necessary actors to pause in a stable way.