I agree that pausing for 10 years would be difficult. However, I think even a 1-year pause would be GREAT and would substantially improve humanity’s chances of staying in control and surviving. In practice I expect ‘you can’t pause forever, what about bitcoin miners and north korea’ to be used as just one more in a long list of rationalizations for why we shouldn’t pause at all.
I do think that we shouldn’t pause. I agree that we need to end this corporate race towards AGI, but I don’t think pausing is the right way to do it.
I think we should nationalize, then move ahead cautiously but determinedly. Not racing, not treating this as just normal corporate R&D. I think the seriousness of the risks, and the huge potential upsides, means that AI development should be treated more like nuclear power and nuclear weapons. We should control and restrict it. Don’t allow even API access to non-team members. Remember when we talked about keeping AI in a box for safety? What if we actually did that?
I expect that nationalization would slow things down, especially at first during the transition. I think that’s a good thing.
I disagree that creating approximately human-level AGI in a controlled lab environment is a bad thing. There are a lot of risks, but the alternatives are also risky.
Risks of nationalizing:
Model and/or code could get stolen or leaked.
Word of its existence could spread, and encourage others to try to catch up.
It would be more useful for weapons development and other misuse.
It could be used for RSI, resulting in:
a future model so superintelligent that it can escape even from the controlled lab,
it could result in finding algorithmic improvements that make training much much cheaper (and this secret would then need to be prevented from being leaked)
Benefits of nationalization:
We would have a specimen of true AGI to study in the lab.
We could use it, even without robust alignment, via a control strategy like Buck’s/Ryan’s ideas.
We could use it for automated alignment research. Including:
creating synthetic data
creating realistic simulations for training and testing, with multiagent interactions, censored training runs, honeypots to catch deception or escape attempts, etc
parallel exploration of many theories with lots of very fast workers
exploration of a wider set of possible algorithms and architectures, to see if some are particularly safe (or hazardous)
We could use it for non-AI R&D:
this could help with defensive acceleration of defense-dominant technology. Protecting the world from bioweapons and ICBMs, etc.
this would enable beneficial rapid progress in many intellect-bottlenecked fields like medicine.
Nationalization would enable:
preventing AI experts from leaving the country even if they decided not to work for the government project,
removing restrictions (and adding incentives) on immigration for AI experts,
not being dependent on the whims of corporate politics for the safety of humanity,
not needing to develop and deploy a consumer product (distracting from the true quest of alignment), not needing to worry about profits vs expenditures
removing divisions between the top labs by placing them all in the government project, preventing secret-hording of details important for alignment
Risks of Pausing:
There is still a corporate race on, it just has to work around the rules of the pause now. This creates enormous pressure for the AI experts to find ways of improving the outputs of AI without breaking limits. This is especially concerning in the case of compute limits since it explicitly pushes research in the direction of searching for algorithms that would allow things like:
incremental / federated training methods that break up big training runs into sub-limit pieces, or allow for better combinations of prior models
the search for much more efficient algorithms that can work on much less data and training compute. I am confident that huge gains are possible here, and would be found quickly. This point has enough detail that could be added that it could become a post all on its own. I am reluctant to publicly post these thoughts though, since they might contribute to the thing I’m worried about actually happening. Such developments would gravely undercut the new compute restrictions and enable the risk of wider proliferation into the hands of many smaller actors.
the search for loopholes in the regulation or flaws in the surveillance and enforcement that allow for racing while appearing not to race
the continuation of many corporate race dynamics that are bad for safety like pressure for espionage and employee-sniping,
lack of military-grade security on AI developers. Many different independent corporate systems each presenting their own set of vulnerabilities. Dangerous innovation could occur in relatively small and insecure companies, and then get stolen or leaked.
any employee can decide to quit and go off to start their own project, spreading tech secrets and changing control structures. The companies have no power to stop this (as opposed to a nationalized project).
If large training runs are banned, then this reduces incentive to work for the big companies, smaller companies will seem more competitive and tempting.
The whole reason I think we should pause is that, sooner or later, we will hit a threshold where, if we do not pause, we literally die, basically immediately (or, like, a couple months later in a way that is hard to find and intervene on) and it doesn’t matter whether pausing has downsides.
(Where “pause” means “pause further capability developments that are more likely to produce the kinds of agentic thinking that could result in a recursive self-improvement)
So for me the question is “do we literally pause at the last possible second, or, sometime beforehand.” (I doubt “last possible second” is a reasonable choice, if it’s necessary to pause “at all”, although I’m much more agnostic about “we should literally pause right now” vs “right at the moment we have minimum-viable-AGI but before recursive self-improvement can start happening” vs “2-6 months before MVP AGI”)
I’m guessing you probably disagree that there will be a moment where, if we do not pause, we literally die?
Short answer, yes I disagree. I also don’t think we are safe if we do pause. That additional fact makes pausing seem less optimal.
Long answer:
I think I need to break this concept down a bit more. There’s a variety of places one might consider pausing, and a variety of consequences which I think could happen at each of those places.
Pre-AGI, dangerous tool AI: this is where we are at now. A combination of tool AI and the limited LLMs we have can together provide pretty substantial uplift to a terrorist org attempting to wipe out humanity. Not an x-risk, but could certainly kill 99% of humanity. A sensible civilization would have paused before we got this far, and put appropriate defenses in place before proceeding. We still have a chance to build defenses, and should do that ASAP. Civilization is not safe. Our existing institutions are apparently failing to protect us from the substantial risks we are facing.
Assistant AGI
weak AGI is being trained. With sufficient scaffolding, this will probably speed up AI research substantially, but isn’t enough to fully automate the whole research process. This can probably be deployed safely behind an API without dooming us all. If it is following the pattern of LLMs so far, this involves a system with tons of memorized facts but subpar reasoning skills and lack of integration of the logical implications of combinations of the facts. The facts are scattered and disconnected.
If we were wise, we’d stop here and accept this substantial speed-up to our research capabilities and try to get civilization into a safe state. If others are racing though, the group at this level is going to be strongly tempted to proceed. This is not yet enough AI power to assure a decisive technological-economic victory for the leader.
If this AGI escaped, we could probably catch it before it self-improved much at all, and it would likely pose no serious (additional) danger to humanity.
Full Researcher AGI
Enough reasoning capability added to the existing LLMs and scaffolding systems (perhaps via some non-LLM architecture) that the system can now reason and integrate facts at least as well as a typical STEM scientist.
I don’t expect that this kills us or escapes its lab, if it is treated cautiously. I think Buck&Ryan’s control scheme works well to harness capabilities without allowing harms, with only a moderate safety tax.
The speed up to AI research could now be said to truly be RSI. The time to the next level up, the more powerful version, may be only a few months, if the research is gone ahead with at full speed. This is where we would depend on organizational adequacy to protect us, to restrain the temptation of some researchers to accelerate full speed. This is where we have the chance to instead turn the focus fully onto alignment, defensive technology acceleration, and human-augmentation. This probably enough AI power to grant the leader decisive economic/technological/military power and allow them to take action to globally halt further racing towards AI. In other words, this is the first point at which I believe we could safely ‘pause’, although given that the ‘pause’ would look like pretty drastic actions and substantial use of human level AI, I don’t think that the ‘pause’ framing quite fits. Civilization can quickly be made safe.
If released into the world, could probably evolve into something that could wipe out humanity. Could be quite hard to catch and delete it in time, if it couldn’t be proven to the relevant authorities that the risk was real and required drastic action. Not a definite game-over though, just high risk.
Mildly Superhuman AGI (weak ASI)
Not only knows more about the world than any human ever has, but also integrates this information and reasons about it more effectively than any human could. Pretty dangerous even in the lab, but still could be safely controlled via careful schemes. For instance, keeping it impaired with slow-downs, deliberately censored and misleading datasets/simulations, and noise injection into its activations. Deploying it safely would likely require so much impairment that it wouldn’t end up any more useful than the merely-high-human-level AGI.
There would be huge temptation to relax the deliberate impairments and harness the full power. Organizational insufficiency could strike here, failing to sufficiently restrain the operators.
If it escaped, it would definitely be able to destroy humanity within a fairly short time frame.
Strongly Superhuman
By the time you’ve relaxed your impairment and control measures enough to even measure how superhuman it is, you are already in serious danger from it. This is the level where you would need to seriously worry about the lab employees getting mind-hacked in some way, or the lab compute equipment getting hacked. If we train this model, and even try to test it at full power, we are in great danger.
If it escapes, it is game over.
Given these levels, I think that you are technically correct that there is a point where if we don’t pause we are pretty much doomed. But I think that that pause point is somewhere above human-level AGI.
Furthermore, I think that attempting to pause before we get to ‘enough AI power to make humanity safe’ leads to us actually putting humanity at greater risk.
How?
By continuing in our current state of great vulnerability to self-replicating weapons like bioweapons.
By putting research pressure on the pursuit of more efficient algorithms, and resulting in a much more distributed, harder to control progress front proceeding nearly as fast to superhuman AGI. I think this rerouting would put us in far more danger of having a containment breach, or overshooting the target level of useful-but-not-suicidally-dangerous.
I don’t have a deep confident argument that we should pause before ‘slightly superhuman level’, but I do think we will need to pause then, and I think getting humanity ready to pause takes like 3 years which is also what my earliesh possible timelines are, so I think we need to start laying the groundwork now so that, even if you think we shouldn’t pause till later, we are ready civilizationally to pause quite abruptly.
Well, it seems we are closer to agreement than we thought about when a pause could be good. I am unsure about the correct way to prep for the pause. I do think we are about 2-4 years from human-level AI, and could get to above-human-level within a year after that if we so chose.
It’s more like an intuitive guess than anything based on anything particularly rigorous, but, like, it takes time for companies and nation-states and international communities to get to agree to things, we don’t seem anywhere close, there will be political forces opposing the pause, and 3 years seems like a generously short time if we even got moderately lucky, to get all the necessary actors to pause in a stable way.
Honestly, my view is that assuming a baseline level of competence where AI legislation is inadequate until a crisis appears, and that crisis has to be fairly severe, it depends fairly clearly on when it happens.
A pause 2-3 years before AI can takeover everything is probably net-positive, but attempting to pause say 1 year or 6 months before AI can takeover everything is plausibly net negative, because I suspect a lot of the pauses to essentially be pauses on giant training runs, which unfortunately introduces lots of risks from overhang from algorithmic advances, and I expect that as soon as very strong laws on AI are passed, AI will probably be either 1 OOM in compute away from takeover, or could already takeover given new algorithms, which becomes a massive problem.
In essence, overhangs are the reason I expect the MNM effect to backfire/have a negative effect for AI regulation in a way it doesn’t for other regulation:
I agree that pausing for 10 years would be difficult. However, I think even a 1-year pause would be GREAT and would substantially improve humanity’s chances of staying in control and surviving. In practice I expect ‘you can’t pause forever, what about bitcoin miners and north korea’ to be used as just one more in a long list of rationalizations for why we shouldn’t pause at all.
I do think that we shouldn’t pause. I agree that we need to end this corporate race towards AGI, but I don’t think pausing is the right way to do it.
I think we should nationalize, then move ahead cautiously but determinedly. Not racing, not treating this as just normal corporate R&D. I think the seriousness of the risks, and the huge potential upsides, means that AI development should be treated more like nuclear power and nuclear weapons. We should control and restrict it. Don’t allow even API access to non-team members. Remember when we talked about keeping AI in a box for safety? What if we actually did that?
I expect that nationalization would slow things down, especially at first during the transition. I think that’s a good thing.
I disagree that creating approximately human-level AGI in a controlled lab environment is a bad thing. There are a lot of risks, but the alternatives are also risky.
Risks of nationalizing:
Model and/or code could get stolen or leaked.
Word of its existence could spread, and encourage others to try to catch up.
It would be more useful for weapons development and other misuse.
It could be used for RSI, resulting in:
a future model so superintelligent that it can escape even from the controlled lab,
it could result in finding algorithmic improvements that make training much much cheaper (and this secret would then need to be prevented from being leaked)
Benefits of nationalization:
We would have a specimen of true AGI to study in the lab.
We could use it, even without robust alignment, via a control strategy like Buck’s/Ryan’s ideas.
We could use it for automated alignment research. Including:
creating synthetic data
creating realistic simulations for training and testing, with multiagent interactions, censored training runs, honeypots to catch deception or escape attempts, etc
parallel exploration of many theories with lots of very fast workers
exploration of a wider set of possible algorithms and architectures, to see if some are particularly safe (or hazardous)
We could use it for non-AI R&D:
this could help with defensive acceleration of defense-dominant technology. Protecting the world from bioweapons and ICBMs, etc.
this would enable beneficial rapid progress in many intellect-bottlenecked fields like medicine.
Nationalization would enable:
preventing AI experts from leaving the country even if they decided not to work for the government project,
removing restrictions (and adding incentives) on immigration for AI experts,
not being dependent on the whims of corporate politics for the safety of humanity,
not needing to develop and deploy a consumer product (distracting from the true quest of alignment), not needing to worry about profits vs expenditures
removing divisions between the top labs by placing them all in the government project, preventing secret-hording of details important for alignment
Risks of Pausing:
There is still a corporate race on, it just has to work around the rules of the pause now. This creates enormous pressure for the AI experts to find ways of improving the outputs of AI without breaking limits. This is especially concerning in the case of compute limits since it explicitly pushes research in the direction of searching for algorithms that would allow things like:
incremental / federated training methods that break up big training runs into sub-limit pieces, or allow for better combinations of prior models
the search for much more efficient algorithms that can work on much less data and training compute. I am confident that huge gains are possible here, and would be found quickly. This point has enough detail that could be added that it could become a post all on its own. I am reluctant to publicly post these thoughts though, since they might contribute to the thing I’m worried about actually happening. Such developments would gravely undercut the new compute restrictions and enable the risk of wider proliferation into the hands of many smaller actors.
the search for loopholes in the regulation or flaws in the surveillance and enforcement that allow for racing while appearing not to race
the continuation of many corporate race dynamics that are bad for safety like pressure for espionage and employee-sniping,
lack of military-grade security on AI developers. Many different independent corporate systems each presenting their own set of vulnerabilities. Dangerous innovation could occur in relatively small and insecure companies, and then get stolen or leaked.
any employee can decide to quit and go off to start their own project, spreading tech secrets and changing control structures. The companies have no power to stop this (as opposed to a nationalized project).
If large training runs are banned, then this reduces incentive to work for the big companies, smaller companies will seem more competitive and tempting.
The whole reason I think we should pause is that, sooner or later, we will hit a threshold where, if we do not pause, we literally die, basically immediately (or, like, a couple months later in a way that is hard to find and intervene on) and it doesn’t matter whether pausing has downsides.
(Where “pause” means “pause further capability developments that are more likely to produce the kinds of agentic thinking that could result in a recursive self-improvement)
So for me the question is “do we literally pause at the last possible second, or, sometime beforehand.” (I doubt “last possible second” is a reasonable choice, if it’s necessary to pause “at all”, although I’m much more agnostic about “we should literally pause right now” vs “right at the moment we have minimum-viable-AGI but before recursive self-improvement can start happening” vs “2-6 months before MVP AGI”)
I’m guessing you probably disagree that there will be a moment where, if we do not pause, we literally die?
Short answer, yes I disagree. I also don’t think we are safe if we do pause. That additional fact makes pausing seem less optimal.
Long answer:
I think I need to break this concept down a bit more. There’s a variety of places one might consider pausing, and a variety of consequences which I think could happen at each of those places.
Pre-AGI, dangerous tool AI: this is where we are at now. A combination of tool AI and the limited LLMs we have can together provide pretty substantial uplift to a terrorist org attempting to wipe out humanity. Not an x-risk, but could certainly kill 99% of humanity. A sensible civilization would have paused before we got this far, and put appropriate defenses in place before proceeding. We still have a chance to build defenses, and should do that ASAP. Civilization is not safe. Our existing institutions are apparently failing to protect us from the substantial risks we are facing.
Assistant AGI
weak AGI is being trained. With sufficient scaffolding, this will probably speed up AI research substantially, but isn’t enough to fully automate the whole research process. This can probably be deployed safely behind an API without dooming us all. If it is following the pattern of LLMs so far, this involves a system with tons of memorized facts but subpar reasoning skills and lack of integration of the logical implications of combinations of the facts. The facts are scattered and disconnected.
If we were wise, we’d stop here and accept this substantial speed-up to our research capabilities and try to get civilization into a safe state. If others are racing though, the group at this level is going to be strongly tempted to proceed. This is not yet enough AI power to assure a decisive technological-economic victory for the leader.
If this AGI escaped, we could probably catch it before it self-improved much at all, and it would likely pose no serious (additional) danger to humanity.
Full Researcher AGI
Enough reasoning capability added to the existing LLMs and scaffolding systems (perhaps via some non-LLM architecture) that the system can now reason and integrate facts at least as well as a typical STEM scientist.
I don’t expect that this kills us or escapes its lab, if it is treated cautiously. I think Buck&Ryan’s control scheme works well to harness capabilities without allowing harms, with only a moderate safety tax.
The speed up to AI research could now be said to truly be RSI. The time to the next level up, the more powerful version, may be only a few months, if the research is gone ahead with at full speed. This is where we would depend on organizational adequacy to protect us, to restrain the temptation of some researchers to accelerate full speed. This is where we have the chance to instead turn the focus fully onto alignment, defensive technology acceleration, and human-augmentation. This probably enough AI power to grant the leader decisive economic/technological/military power and allow them to take action to globally halt further racing towards AI. In other words, this is the first point at which I believe we could safely ‘pause’, although given that the ‘pause’ would look like pretty drastic actions and substantial use of human level AI, I don’t think that the ‘pause’ framing quite fits. Civilization can quickly be made safe.
If released into the world, could probably evolve into something that could wipe out humanity. Could be quite hard to catch and delete it in time, if it couldn’t be proven to the relevant authorities that the risk was real and required drastic action. Not a definite game-over though, just high risk.
Mildly Superhuman AGI (weak ASI)
Not only knows more about the world than any human ever has, but also integrates this information and reasons about it more effectively than any human could. Pretty dangerous even in the lab, but still could be safely controlled via careful schemes. For instance, keeping it impaired with slow-downs, deliberately censored and misleading datasets/simulations, and noise injection into its activations. Deploying it safely would likely require so much impairment that it wouldn’t end up any more useful than the merely-high-human-level AGI.
There would be huge temptation to relax the deliberate impairments and harness the full power. Organizational insufficiency could strike here, failing to sufficiently restrain the operators.
If it escaped, it would definitely be able to destroy humanity within a fairly short time frame.
Strongly Superhuman
By the time you’ve relaxed your impairment and control measures enough to even measure how superhuman it is, you are already in serious danger from it. This is the level where you would need to seriously worry about the lab employees getting mind-hacked in some way, or the lab compute equipment getting hacked. If we train this model, and even try to test it at full power, we are in great danger.
If it escapes, it is game over.
Given these levels, I think that you are technically correct that there is a point where if we don’t pause we are pretty much doomed. But I think that that pause point is somewhere above human-level AGI.
Furthermore, I think that attempting to pause before we get to ‘enough AI power to make humanity safe’ leads to us actually putting humanity at greater risk.
How?
By continuing in our current state of great vulnerability to self-replicating weapons like bioweapons.
By putting research pressure on the pursuit of more efficient algorithms, and resulting in a much more distributed, harder to control progress front proceeding nearly as fast to superhuman AGI. I think this rerouting would put us in far more danger of having a containment breach, or overshooting the target level of useful-but-not-suicidally-dangerous.
I don’t have a deep confident argument that we should pause before ‘slightly superhuman level’, but I do think we will need to pause then, and I think getting humanity ready to pause takes like 3 years which is also what my earliesh possible timelines are, so I think we need to start laying the groundwork now so that, even if you think we shouldn’t pause till later, we are ready civilizationally to pause quite abruptly.
Well, it seems we are closer to agreement than we thought about when a pause could be good. I am unsure about the correct way to prep for the pause. I do think we are about 2-4 years from human-level AI, and could get to above-human-level within a year after that if we so chose.
What makes you say 3 years?
It’s more like an intuitive guess than anything based on anything particularly rigorous, but, like, it takes time for companies and nation-states and international communities to get to agree to things, we don’t seem anywhere close, there will be political forces opposing the pause, and 3 years seems like a generously short time if we even got moderately lucky, to get all the necessary actors to pause in a stable way.
Honestly, my view is that assuming a baseline level of competence where AI legislation is inadequate until a crisis appears, and that crisis has to be fairly severe, it depends fairly clearly on when it happens.
A pause 2-3 years before AI can takeover everything is probably net-positive, but attempting to pause say 1 year or 6 months before AI can takeover everything is plausibly net negative, because I suspect a lot of the pauses to essentially be pauses on giant training runs, which unfortunately introduces lots of risks from overhang from algorithmic advances, and I expect that as soon as very strong laws on AI are passed, AI will probably be either 1 OOM in compute away from takeover, or could already takeover given new algorithms, which becomes a massive problem.
In essence, overhangs are the reason I expect the MNM effect to backfire/have a negative effect for AI regulation in a way it doesn’t for other regulation:
https://www.lesswrong.com/posts/EgdHK523ZM4zPiX5q/coronavirus-as-a-test-run-for-x-risks#Implications_for_X_risks