I think your discussion for why humanity could survive a misaligned superintelligence is missing a lot. Here are a couple claims:
When there are ASIs in the world, we will see ~100 years of technological progress in 5 years (or like, what would have taken humanity 100 years in the absence of AI). This will involve the development of many very lethal technologies.
The aligned AIs will fail to defend the world against at least one of those technologies.
Why do I believe point 2? It seems like the burden of proof is really high to say that “nope, every single one of those dangerous technologies is going to be something that it is technically possible for the aligned AIs to defend against, and they will have enough lead time to do so, in every single case”. If you’re assuming we’re in a world with misaligned ASIs, then every single existentially dangerous technology is another disjunctive source of risk. Looking out at the maybe-existentially-dangerous technologies that have been developed previously and that could be developed in the future, e.g., nuclear weapons, biological weapons, mirror bacteria, false vacuum decay, nanobots, I don’t feel particularly hopeful that we will avoid catastrophe. We’ve survived nuclear weapons so far, but with a few very close calls — if you assume other existentially dangerous technologies go like this, then we probably won’t make it past a few of them. Now crunch that all into a few years, and like gosh it seems like a ton of unjustified optimism to think we’ll survive every one of these challenges.
It’s pretty hard to convey my intuition around the vulnerable world hypothesis, I also try to do so here.
They often think the “good AGI” will keep the “bad AGI” in check. I really disagree with that because
The “population of AGI” is nothing like the population of humans, it is far more homogeneous because the most powerful AGI can just copy itself until it takes over most of the compute. If we fail to align them, different AGI will end up misaligned for the same reason.
Eric Drexler envisions humans equipped with AI services acting as the good AGI. But having a human controlling enough decisions to ensure alignment will slow things down.
If the first ASI is bad, it may build replicating machines/nanobots.
There are people who worry about slow takeoff risks:
Ryan Greenblatt’s comment above “winning a war against a rogue AI seems potentially doable, including a rogue AI which is substantially more capable than humans”
They are worried about “Von Neumann level AGI,” which poses a threat to humanity because they can build mirror bacteria and threaten humanity into following their will. The belief is that the war between it and humanity will be drawn out and uncertain, there may be negotiations.
They may imagine good AGI and bad AGI existing at the same time, but aren’t sure the good ones will win. Dan Hendryck’s view is the AGI will start off aligned, but humanity may become economically dependent on it and fall for its propaganda until it evolves into misalignment.
Finally, there are people who worry about fast takeoff risks:
They believe that Von Neumann level AGI will not pose much direct risk, but they will be better at humans at AI research (imagine a million AI researchers), and will recursively self improve to superintelligence.
The idea is that AI research powered by the AI themselves will be limited by the speed of computers, not the speed of human neurons, so its speed might not be completely dissimilar to the speed of human research. Truly optimal AI research probably needs only tiny amounts of compute to reach superintelligence. DeepSeek’s cutting edge AI only took $6 million (supposedly) while four US companies spent around $210 billion on infrastructure (mostly for AI).
Superintelligence will not need to threaten humans with bioweapons or fight a protracted war. Once it actually escapes, it will defeat humanity with absolute ease. It can build self replicating nanofactories which grow as fast as bacteria and fungi, and which form body plans as sophisticated as animals.
Soon after it builds physical machines, it expands across the universe as close to the speed of light as physically possible.
These people worry about the first AGI/ASI being misaligned, but don’t worry about the second one as much because the first one would have already destroyed the world or saved the world permanently.
I consider myself split between the second group and third group.
FWIW, I think recusive self-improvment via just software (software only singularity) is reasonably likely to be feasible (perhaps 55%), but this alone doesn’t suffice for takeoff being arbitrary fast.
Further, even objectively very fast takeoff (von Neumann to superintelligence in 6 months) can be enough time to win a war etc.
I agree, a lot of outcomes are possible and there’s no reason to think only fast takeoffs are dangerous+likely.
Also I went too far saying that it “needs only tiny amounts of compute to reach superintelligence” without caveats. The $6 million is disputed by a video arguing that DeepSeek used far more compute than they admit to.
The $6 million is disputed by a video arguing that DeepSeek used far more compute than they admit to.
The prior reference is a Dylan Patel tweet from Nov 2024, in the wake of R1-Lite-Preview release:
Deepseek has over 50k Hopper GPUs to be clear.
People need to stop acting like they only have that 10k A100 cluster.
They are omega cracked on ML research and infra management but they aren’t doing it with that many fewer GPUs
DeepSeek-V3 requires only 2.788M H800 GPU hours for its full training.
This seems unlikely to be a lie, the reputational damage would’ve motivated not mentioning amount of compute instead, but the most interesting thing about DeepSeek-V3 is precisely this claim, that its quality is possible with so little compute.
Certainly designing the architecture, the data mix, and the training process that made it possible required much more compute than the final training run, so in total it cost much more to develop than $6 million. And the 50K H100/H800 system is one way to go about that, though renting a bunch of 512-GPU instances from various clouds probably would’ve sufficed as well.
I don’t actually know about DeepSeek V3, I just felt “if I pointed out the $6 million claim in my argument, I shouldn’t hide the fact I watched a video which made myself doubt it.”
I wanted to include the video as a caveat just in case the $6 million was wrong.
Your explanation suggests the $6 million is still in the ballpark (for the final training run), so the concerns about a “software only singularity” are still very realistic.
The “population of AGI” is nothing like the population of humans, it is far more homogeneous because the most powerful AGI can just copy itself until it takes over most of the compute. If we fail to align them, different AGI will end up misaligned for the same reason.
I agree with this, with the caveat that synthetic data usage, especially targeted synthetic data as part of alignment efforts can make real differences in AI values, but yes this is a big factor people underestimate.
Mostly, it means that while an AI produced in company is likely to have homogenous values and traits, different firms will have differing AI values, meaning that the values and experience for AIs will be drastically different inter-firm.
See my response to ryan_greenblatt (don’t know how to link comments here). You claim is that the defense/offense ratio is infinite. I don’t know why this would have been the case.
Crucially I am not saying that we are guaranteed to end up in a good place, or that superhuman unaligned ASIs cannot destroy the world. Just that if they are completely dominated (so not like the nuke ratio of US and Russia but more like US and North Korea) then we should be able to keep them at bay.
Hm, sorry, I did not mean to imply that the defense/offense ratio is infinite. It’s hard to know, but I expect it’s finite for the vast majority of dangerous technologies[1]. I do think there are times where the amount of resources and intelligence needed to do defense are too high and a civilization cannot do them. If an astroid were headed for earth 200 years ago, we simply would not have been able to do anything to stop it. Asteroid defense is not impossible in principle — the defensive resources and intelligence needed are not infinite — but they are certainly above what 1825 humanity could have mustered in a few years. It’s not in principle impossible, but it’s impossible for 1825 humanity.
While defense/offense ratios are relevant, I was more-so trying to make the points that these are disjunctive threats, some might be hard to defend against (i.e., have a high defense-offense ratio), and we’ll have to do that on a super short time frame. I think this argument goes through unless one is fairly optimistic about the defense-offense ratio for all the technologies that get developed rapidly. I think the argumentative/evidential burden to be on net optimistic about this situation is thus pretty high, and per the public arguments I have seen, unjustified.
(I think it’s possible I’ve made some heinous reasoning error that places too much burden on the optimism case, if that’s true, somebody please point it out)
To be clear, it certainly seems plausible that some technologies have a defense/offense ratio which is basically unachievable with conventional defense, and that you need to do something like mass surveillance to deal with these. e.g., triggering vacuum decay seems like the type of thing where there may not be technological responses that avert catastrophe if the decay has started, instead the only effective defenses are ones that stop anybody from doing the thing to begin with.
You link a comment by clicking the timestamp next to the username (which, now that I say it, does seem quite unintuitive… Maybe it should also be possible via the three dots on the right side).
I think your discussion for why humanity could survive a misaligned superintelligence is missing a lot. Here are a couple claims:
When there are ASIs in the world, we will see ~100 years of technological progress in 5 years (or like, what would have taken humanity 100 years in the absence of AI). This will involve the development of many very lethal technologies.
The aligned AIs will fail to defend the world against at least one of those technologies.
Why do I believe point 2? It seems like the burden of proof is really high to say that “nope, every single one of those dangerous technologies is going to be something that it is technically possible for the aligned AIs to defend against, and they will have enough lead time to do so, in every single case”. If you’re assuming we’re in a world with misaligned ASIs, then every single existentially dangerous technology is another disjunctive source of risk. Looking out at the maybe-existentially-dangerous technologies that have been developed previously and that could be developed in the future, e.g., nuclear weapons, biological weapons, mirror bacteria, false vacuum decay, nanobots, I don’t feel particularly hopeful that we will avoid catastrophe. We’ve survived nuclear weapons so far, but with a few very close calls — if you assume other existentially dangerous technologies go like this, then we probably won’t make it past a few of them. Now crunch that all into a few years, and like gosh it seems like a ton of unjustified optimism to think we’ll survive every one of these challenges.
It’s pretty hard to convey my intuition around the vulnerable world hypothesis, I also try to do so here.
I think there’s a spectrum of belief regarding AGI power and danger.
There are people optimistic about AGI (but worry about bad human users):
Eric Drexler (“Reframing Superintelligence” + LLMs + 4 years)
A Solution for AGI/ASI Safety
This post
They often think the “good AGI” will keep the “bad AGI” in check. I really disagree with that because
The “population of AGI” is nothing like the population of humans, it is far more homogeneous because the most powerful AGI can just copy itself until it takes over most of the compute. If we fail to align them, different AGI will end up misaligned for the same reason.
Eric Drexler envisions humans equipped with AI services acting as the good AGI. But having a human controlling enough decisions to ensure alignment will slow things down.
If the first ASI is bad, it may build replicating machines/nanobots.
There are people who worry about slow takeoff risks:
Redwood
This comment by Buck
Ryan Greenblatt’s comment above “winning a war against a rogue AI seems potentially doable, including a rogue AI which is substantially more capable than humans”
Dan Hendrycks’s views on AGI selection pressure
I think Anthropic’s view is here
Eric Drexler again (Applying superintelligence without collusion)
It looks like your comment is here
They are worried about “Von Neumann level AGI,” which poses a threat to humanity because they can build mirror bacteria and threaten humanity into following their will. The belief is that the war between it and humanity will be drawn out and uncertain, there may be negotiations.
They may imagine good AGI and bad AGI existing at the same time, but aren’t sure the good ones will win. Dan Hendryck’s view is the AGI will start off aligned, but humanity may become economically dependent on it and fall for its propaganda until it evolves into misalignment.
Finally, there are people who worry about fast takeoff risks:
The Case Against AI Control Research
MIRI
Nick Bostrom
They believe that Von Neumann level AGI will not pose much direct risk, but they will be better at humans at AI research (imagine a million AI researchers), and will recursively self improve to superintelligence.
The idea is that AI research powered by the AI themselves will be limited by the speed of computers, not the speed of human neurons, so its speed might not be completely dissimilar to the speed of human research. Truly optimal AI research probably needs only tiny amounts of compute to reach superintelligence. DeepSeek’s cutting edge AI only took $6 million (supposedly) while four US companies spent around $210 billion on infrastructure (mostly for AI).
Superintelligence will not need to threaten humans with bioweapons or fight a protracted war. Once it actually escapes, it will defeat humanity with absolute ease. It can build self replicating nanofactories which grow as fast as bacteria and fungi, and which form body plans as sophisticated as animals.
Soon after it builds physical machines, it expands across the universe as close to the speed of light as physically possible.
These people worry about the first AGI/ASI being misaligned, but don’t worry about the second one as much because the first one would have already destroyed the world or saved the world permanently.
I consider myself split between the second group and third group.
FWIW, I think recusive self-improvment via just software (software only singularity) is reasonably likely to be feasible (perhaps 55%), but this alone doesn’t suffice for takeoff being arbitrary fast.
Further, even objectively very fast takeoff (von Neumann to superintelligence in 6 months) can be enough time to win a war etc.
I agree, a lot of outcomes are possible and there’s no reason to think only fast takeoffs are dangerous+likely.
Also I went too far saying that it “needs only tiny amounts of compute to reach superintelligence” without caveats. The $6 million is disputed by a video arguing that DeepSeek used far more compute than they admit to.
The prior reference is a Dylan Patel tweet from Nov 2024, in the wake of R1-Lite-Preview release:
DeepSeek explicitly states that
This seems unlikely to be a lie, the reputational damage would’ve motivated not mentioning amount of compute instead, but the most interesting thing about DeepSeek-V3 is precisely this claim, that its quality is possible with so little compute.
Certainly designing the architecture, the data mix, and the training process that made it possible required much more compute than the final training run, so in total it cost much more to develop than $6 million. And the 50K H100/H800 system is one way to go about that, though renting a bunch of 512-GPU instances from various clouds probably would’ve sufficed as well.
I see, thank you for the info!
I don’t actually know about DeepSeek V3, I just felt “if I pointed out the $6 million claim in my argument, I shouldn’t hide the fact I watched a video which made myself doubt it.”
I wanted to include the video as a caveat just in case the $6 million was wrong.
Your explanation suggests the $6 million is still in the ballpark (for the final training run), so the concerns about a “software only singularity” are still very realistic.
I agree with this, with the caveat that synthetic data usage, especially targeted synthetic data as part of alignment efforts can make real differences in AI values, but yes this is a big factor people underestimate.
How does that relate to homogeniety?
Mostly, it means that while an AI produced in company is likely to have homogenous values and traits, different firms will have differing AI values, meaning that the values and experience for AIs will be drastically different inter-firm.
See my response to ryan_greenblatt (don’t know how to link comments here). You claim is that the defense/offense ratio is infinite. I don’t know why this would have been the case.
Crucially I am not saying that we are guaranteed to end up in a good place, or that superhuman unaligned ASIs cannot destroy the world. Just that if they are completely dominated (so not like the nuke ratio of US and Russia but more like US and North Korea) then we should be able to keep them at bay.
Hm, sorry, I did not mean to imply that the defense/offense ratio is infinite. It’s hard to know, but I expect it’s finite for the vast majority of dangerous technologies[1]. I do think there are times where the amount of resources and intelligence needed to do defense are too high and a civilization cannot do them. If an astroid were headed for earth 200 years ago, we simply would not have been able to do anything to stop it. Asteroid defense is not impossible in principle — the defensive resources and intelligence needed are not infinite — but they are certainly above what 1825 humanity could have mustered in a few years. It’s not in principle impossible, but it’s impossible for 1825 humanity.
While defense/offense ratios are relevant, I was more-so trying to make the points that these are disjunctive threats, some might be hard to defend against (i.e., have a high defense-offense ratio), and we’ll have to do that on a super short time frame. I think this argument goes through unless one is fairly optimistic about the defense-offense ratio for all the technologies that get developed rapidly. I think the argumentative/evidential burden to be on net optimistic about this situation is thus pretty high, and per the public arguments I have seen, unjustified.
(I think it’s possible I’ve made some heinous reasoning error that places too much burden on the optimism case, if that’s true, somebody please point it out)
To be clear, it certainly seems plausible that some technologies have a defense/offense ratio which is basically unachievable with conventional defense, and that you need to do something like mass surveillance to deal with these. e.g., triggering vacuum decay seems like the type of thing where there may not be technological responses that avert catastrophe if the decay has started, instead the only effective defenses are ones that stop anybody from doing the thing to begin with.
You link a comment by clicking the timestamp next to the username (which, now that I say it, does seem quite unintuitive… Maybe it should also be possible via the three dots on the right side).