Is there a one stop shop type article presenting the AI doomer argument? I read the sequence posts related to AI doom but they’re very scattered and more tailored toward trying to I guess exploring ideas than presenting a solid, cohesive argument. Of course, I’m sure that was the approach that made sense at the time. But I was wondering if since then there’s been made some kind of canonical presentation of the AI doom argument? Something in the “attempts to be logically sound” side of things.
List of lethalities is not by any means a “one stop shop”. If you don’t agree with Eliezer on 90% of the relevant issues, it’s completely unconvincing. For example, in that article he takes as an assumption that an AGI will be godlike level omnipotent, and that it will default to murderism.
If you don’t agree with Eliezer on 90% of the relevant issues, it’s completely unconvincing.
Of course. What kind of miracle are you expecting?
It also doesn’t go into much depth on many of the main counterarguments. And doesn’t go into enough detail that it even gets close to “logically sound”. And it’s not as condensed as I’d like. And it skips over a bunch of background. Still, it’s valuable, and it’s the closest thing to a one-post summary of why Eliezer is pessimistic about the outcome of AGI.
The main value of list of lethalities as a one-stop shop is that you can read it and then be able to point to roughly where you disagree with Eliezer. And this is probably what you want if you’re looking for canonical arguments for AI risk. Then you can look further into that disagreement if you want.
Reading the rest of your comment very charitably: It looks like your disagreements are related to where AGI capability caps out, and whether default goals involve niceness to humans. Great!
If I read your comment more literally, my guess would be that you haven’t read list of lethalities, or are happy misrepresenting positions you disagree with.
he takes as an assumption that an AGI will be godlike level omnipotent
He specifically defines a dangerous intelligence level as around the level required to design and build a nanosystem capable of building a nanosystem (or any of several alternative example capabilities) (In point 3). Maybe your omnipotent gods are lame.
and that it will default to murderism
This is false. Maybe you are referring to how there isn’t any section justifying instrumental convergence? But it does have a link, and it notes that it’s skipping over a bunch of background in that area (-3). That would be a different assumption, but if you’re deliberately misrepresenting it, then that might be the part that you are misrepresenting.
I might give the essence of the assumptions as something like: you can’t beat superintelligence; intelligence is independent of value; and human survival and flourishing require specific complex values that we don’t know how to specify.
But further pitfalls reveal themselves later, e.g. you may think you have specified human-friendly values correctly, but the AI may then interpret the specification in an unexpected way.
What is clearer than doom, is that creation of superintelligent AI is an enormous gamble, because it means irreversibly handing control of the world to something non-human. Eliezer’s position is that you shouldn’t do that unless you absolutely know what you’re doing. The position of the would-be architects of superintelligent AI is that hopefully they can figure out everything needed for a happy ending, in the course of their adventure.
One further point I would emphasize, in the light of the last few years of experience with generative AI, is the unpredictability of the output of these powerful systems. You can type in a prompt, and get back a text, an image, or a video, which is like nothing you anticipated, and sometimes it is very definitely not what you want. “Generative superintelligence” has the potential to produce a surprising and possibly “wrong” output that will transform the world and be impossible to undo.
I think this post is an excellent distillation of the AI doomer argument, and it importantly helps me understand why people think AI alignment was going to be difficult:
What I have noticed is that while there are cogent overviews of AI safety that don’t come to the extreme conclusion that we all going to be killed by AI with high probability....and there are articles that do come to that conclusion without being at all rigorous or cogent....there aren’t any that do both. From that I conclude there aren’t any good reasons to believe in extreme AI doom scenarios, and you should disbelieve them. Others use more complicated reasoning, like “Yudkowsky is too intelligent to communicate his ideas to lesser mortals, but household believe him anyway”.
@MitchellPorter supplies us with some examples of gappy arguments.
human survival and flourishing require specific complex values that we don’t know how to specify
There ’s no evidence that “human values” are even a coherent entity , and no reason to believe that any
AI of any architecture would need them.
But further pitfalls reveal themselves later, e.g. you may think you have specified human-friendly values correctly, but the AI may then interpret the specification in an unexpected way.
What is clearer than doom, is that creation of superintelligent AI is an enormous gamble, because it means irreversibly handing control of the world
Hang on a minute. Where does control of the come from? Do we give it to the AI? Does it take it?
to something non-human. Eliezer’s position is that you shouldn’t do that unless you absolutely know what you’re doing. The position of the would-be architects of superintelligent AI is that hopefully they can figure out everything needed for a happy ending, in the course of their adventure.
One further point I would emphasize, in the light of the last few years of experience with generative AI, is the unpredictability of the output of these powerful systems. You can type in a prompt, and get back a text, an image, or a video, which is like nothing you anticipated, and sometimes it is very definitely not what you want. “Generative superintelligence” has the potential to produce a surprising and possibly “wrong” output that will transform the world and be impossible to undo.
Current generative AI has no ability to directly affect anything. Where would that come from?
I don’t know that “the AI doomer argument” is a coherent thing. At least I haven’t seen an attempt to gather or summarize it in an authoritative way. In fact, it’s not really an argument (as far as I’ve seen), it’s somewhere between a vibe and a prediction.
For me, when I’m in a doomer mood, it’s easy to give a high probability to the idea that humanity will be extinct fairly soon (it may take centuries to fully die out, but will be fully irreversible path in 10-50 years, if it’s not already). Note that this has been a common belief long before AI was a thing—nuclear war/winter, ecological collapse, pandemic, etc. are pretty scary, and humans are fragile.
My optimistic “argument” is really not better-formed. Humans are clever, and when they can no longer ignore a problem, they solve it. We might lose 90%+ of the current global population, and a whole lot of supply-chain and tech capability, but that’s really only a few doublings lost, maybe a millennium to recover, and maybe we’ll be smarter/luckier in the next cycle.
From your perspective, what do you think the argument is, in terms of thesis and support?
There are a lot of detailed arguments for doom by misaligned AGI.
Coming to grips with them, and the conterarguments in actual proposals for aligning AGI and managing the political and economic fallout, is a herculean task. I feel it’s taken me about two years of spending the majority of my work time on doing that to even have my head mostly around most of the relevant arguments. Having done that, my p(doom) is still roughly 50%, with wide uncertainty for unknown unknows still to be revealed or identified.
So if someone isn’t going to do that, I think the above summary is pretty accurate. Alignment and managing the resulting shifts in the world is not easy, but it’s not impossible. Sometimes humans do amazing things. Sometimes they do amazingly stupid things. So again, roughly 50% from this much rougher method.
In 2027 the trend that began in 2024 with OpenAI’s o1 reasoning system has continued. The compute required to run AI is no longer negligible compared to the cost of training it. Models reason over long periods of time. Their effective context windows are massive, they update their underlying models continuously, and they break tasks down into sub-tasks to be carried out in parallel. The base LLM they are built on is two generations ahead of GPT-4.
These systems are language model agents. They are built with self-understanding and can be configured for autonomy. These constitute proto-AGI. They are artificial intelligences that can perform much but not all of the intellectual work that humans can do (although even what these AI can do, they cannot necessarily do cheaper than a human could).
In 2029 people have spent over a year working hard to improve the scaffolding around proto-AGI to make it as useful as possible. Presently, the next generation of LLM foundational model is released. Now, with some further improvements to the reasoning and learning scaffolding, this is true AGI. It can perform any intellectual task that a human could (although it’s very expensive to run at full capacity). It is better at AI research than any human. But it is not superintelligence. It is still controllable and its thoughts are still legible. So, it is put to work on AI safety research. Of course, by this point much progress has already been made on AI safety—but it seems prudent to get the AGI to look into the problem and get its go-ahead before commencing with the next training run. After a few months the AI declares it has found an acceptable safety approach. It spends some time on capabilities research then the training run for the next LLM begins.
In 2030 the next LLM is completed, and improved scaffolding is constructed. Now human-level AI is cheap, better-than-human-AI is not too expensive, and the peak capabilities of the AI are almost alien. For a brief period of time the value of human labour skyrockets, workers acting as puppets as the AI instructs them over video-call to do its bidding. This is necessary due to a major robotics shortfall. Human puppet-workers work in mines, refineries, smelters, and factories, as well as in logistics, optics, and general infrastructure. Human bottlenecks need to be addressed. This takes a few months, but the ensuing robotics explosion is rapid and massive.
2031 is the year of the robotics explosion. The robots are physically optimised for their specific tasks, coordinate perfectly with other robots, are able to sustain peak performance, do not require pay, and are controlled by cleverer-than-human minds. These are all multiplicative factors for the robots’ productivity relative to human workers. Most robots are not humanoid, but let’s say a humanoid robot would cost $x. Per $x robots in 2031 are 10,000 more productive than a human. This might sound like a ridiculously high number: one robot the equivalent of 10,000 humans? But let’s do some rough math:
Advantage | Productivity Multiplier (relative to skilled human)
Physically optimised for their specific tasks | 5
Coordinate perfectly with other robots | 10
Able to sustain peak performance | 5
Do not require pay | 2
Controlled by cleverer-than-human minds | 20
5*10*5*2*20 = 10,000
Suppose that a human can construct one robot per year (taking into account mining and all the intermediary logistics and manufacturing). With robots 10^4 times as productive as humans, each robot will construct an average of 10^4 robots per year. This is the robotics explosion. By the end of the year there will be a 10^11 robots (more precisely, an amount of robots that is cost-equivalent to 10^11 humanoid robots).
By 2032 there are 10^11 robots, each with the productivity of 10^4 skilled human workers. That is a total productivity equivalent to 10^15 skilled human workers. This is roughly 10^5 times the productivity of humanity in 2024. At this point trillions of advanced processing units have been constructed and are online. Industry expands through the Solar System. The number of robots continues to balloon. The rate of research and development accelerates rapidly. Human mind upload is achieved.
Here is an experiment that demonstrates the unlikelihood of one potential AI outcome.
The outcome shown to be unlikely:
Aligned ASI is achieved sometime in the next couple of decades and each person is apportioned a sizable amount of compute to do with as they wish.
The experiment:
I have made a precommitment that I will, conditional on the outcome described above occurring, simulate billions of lives for myself—each indistinguishable from the life I have lived so far. By “indistinguishable” I do not necessarily mean identical (which might be impossible or expensive). All that is necessary is that each has similar amounts of suffering, scale, detail, imminent AGI, etc. I’ll set up these simulations so that in each of these simulated lives I will be transported at 4:00 pm Dec11′24 to a virtual personal utopia. Having precommitted to simulating these worlds, I should now expect to be transported into a personal utopia in three minutes time if this future is likely. And if I am not transported into a personal utopia I should conclude that this future is unlikely.
Let’s see what happens...
It’s 4:00 pm and I didn’t get transported into utopia.
So, this outcome is unlikely.
QED
Potential weak points
I do see a couple of potential weak points in the logic of this experiment. Firstly, it might be the case that I’ll have reason to simulate many indistinguishable lives in which I do not get transported to utopia, which would throw off the math. But I can’t see why I’d choose to create simulations of myself in not optimally-enjoyable lives unless I had good reason to, so I don’t think that objection holds.[1]
The other potential weak point is that perhaps I wouldn’t be willing to pay the opportunity cost of billions of years of personal utopia. Although billions of years of simulation is just a tiny proportion of my compute budget, it’s still billions of years that could otherwise have been spent in perfect virtual utopia. I think this potentially a serious issue with the argument, although I will note that I don’t actually have to simulate an entire life for the experiment to work, just a few minutes around 4:00pm on Dec11′24, minutes which were vaguely enjoyable. To address this objection the experiment could be carried out while euphoric (since the opportunity cost would then be lower).
Perhaps, as a prank response to this post, someone could use some of their compute budget to simulate lives in which I don’t get transported to utopia. But I think that there would be restrictions in place against running other people as anything other than p-zombies.
Is there a one stop shop type article presenting the AI doomer argument? I read the sequence posts related to AI doom but they’re very scattered and more tailored toward trying to I guess exploring ideas than presenting a solid, cohesive argument. Of course, I’m sure that was the approach that made sense at the time. But I was wondering if since then there’s been made some kind of canonical presentation of the AI doom argument? Something in the “attempts to be logically sound” side of things.
If you’re looking for recent, canonical one-stop-shop, the answer is List of Lethalities.
List of lethalities is not by any means a “one stop shop”. If you don’t agree with Eliezer on 90% of the relevant issues, it’s completely unconvincing. For example, in that article he takes as an assumption that an AGI will be godlike level omnipotent, and that it will default to murderism.
Of course. What kind of miracle are you expecting?
It also doesn’t go into much depth on many of the main counterarguments. And doesn’t go into enough detail that it even gets close to “logically sound”. And it’s not as condensed as I’d like. And it skips over a bunch of background. Still, it’s valuable, and it’s the closest thing to a one-post summary of why Eliezer is pessimistic about the outcome of AGI.
The main value of list of lethalities as a one-stop shop is that you can read it and then be able to point to roughly where you disagree with Eliezer. And this is probably what you want if you’re looking for canonical arguments for AI risk. Then you can look further into that disagreement if you want.
Reading the rest of your comment very charitably: It looks like your disagreements are related to where AGI capability caps out, and whether default goals involve niceness to humans. Great!
If I read your comment more literally, my guess would be that you haven’t read list of lethalities, or are happy misrepresenting positions you disagree with.
He specifically defines a dangerous intelligence level as around the level required to design and build a nanosystem capable of building a nanosystem (or any of several alternative example capabilities) (In point 3). Maybe your omnipotent gods are lame.
This is false. Maybe you are referring to how there isn’t any section justifying instrumental convergence? But it does have a link, and it notes that it’s skipping over a bunch of background in that area (-3). That would be a different assumption, but if you’re deliberately misrepresenting it, then that might be the part that you are misrepresenting.
David Chalmers asked for one last year, but there isn’t.
I might give the essence of the assumptions as something like: you can’t beat superintelligence; intelligence is independent of value; and human survival and flourishing require specific complex values that we don’t know how to specify.
But further pitfalls reveal themselves later, e.g. you may think you have specified human-friendly values correctly, but the AI may then interpret the specification in an unexpected way.
What is clearer than doom, is that creation of superintelligent AI is an enormous gamble, because it means irreversibly handing control of the world to something non-human. Eliezer’s position is that you shouldn’t do that unless you absolutely know what you’re doing. The position of the would-be architects of superintelligent AI is that hopefully they can figure out everything needed for a happy ending, in the course of their adventure.
One further point I would emphasize, in the light of the last few years of experience with generative AI, is the unpredictability of the output of these powerful systems. You can type in a prompt, and get back a text, an image, or a video, which is like nothing you anticipated, and sometimes it is very definitely not what you want. “Generative superintelligence” has the potential to produce a surprising and possibly “wrong” output that will transform the world and be impossible to undo.
I’d actually recommend Zvi’s On A List of Lethalities over the original, as a more readily understandable version that covers the same arguments.
I think this post is an excellent distillation of the AI doomer argument, and it importantly helps me understand why people think AI alignment was going to be difficult:
https://www.lesswrong.com/posts/wnkGXcAq4DCgY8HqA/a-case-for-ai-alignment-being-difficult
I think AGI Safety From First Principles by Richard Ngo is probably good.
I think AGI Ruin: A List of Lethalities is comprehensive but also sort of advanced and skips over the two basic bits.
Superintelligence FAQ [1] as well.
What I have noticed is that while there are cogent overviews of AI safety that don’t come to the extreme conclusion that we all going to be killed by AI with high probability....and there are articles that do come to that conclusion without being at all rigorous or cogent....there aren’t any that do both. From that I conclude there aren’t any good reasons to believe in extreme AI doom scenarios, and you should disbelieve them. Others use more complicated reasoning, like “Yudkowsky is too intelligent to communicate his ideas to lesser mortals, but household believe him anyway”.
(See @DPiepgrass saying something similar and of course getting downvoted).
@MitchellPorter supplies us with some examples of gappy arguments.
There ’s no evidence that “human values” are even a coherent entity , and no reason to believe that any AI of any architecture would need them.
But further pitfalls reveal themselves later, e.g. you may think you have specified human-friendly values correctly, but the AI may then interpret the specification in an unexpected way.
Hang on a minute. Where does control of the come from? Do we give it to the AI? Does it take it?
to something non-human. Eliezer’s position is that you shouldn’t do that unless you absolutely know what you’re doing. The position of the would-be architects of superintelligent AI is that hopefully they can figure out everything needed for a happy ending, in the course of their adventure.
Current generative AI has no ability to directly affect anything. Where would that come from?
Perhaps see https://homosabiens.substack.com/p/deadly-by-default by Duncan Sabien.
I don’t know that “the AI doomer argument” is a coherent thing. At least I haven’t seen an attempt to gather or summarize it in an authoritative way. In fact, it’s not really an argument (as far as I’ve seen), it’s somewhere between a vibe and a prediction.
For me, when I’m in a doomer mood, it’s easy to give a high probability to the idea that humanity will be extinct fairly soon (it may take centuries to fully die out, but will be fully irreversible path in 10-50 years, if it’s not already). Note that this has been a common belief long before AI was a thing—nuclear war/winter, ecological collapse, pandemic, etc. are pretty scary, and humans are fragile.
My optimistic “argument” is really not better-formed. Humans are clever, and when they can no longer ignore a problem, they solve it. We might lose 90%+ of the current global population, and a whole lot of supply-chain and tech capability, but that’s really only a few doublings lost, maybe a millennium to recover, and maybe we’ll be smarter/luckier in the next cycle.
From your perspective, what do you think the argument is, in terms of thesis and support?
There are a lot of detailed arguments for doom by misaligned AGI.
Coming to grips with them, and the conterarguments in actual proposals for aligning AGI and managing the political and economic fallout, is a herculean task. I feel it’s taken me about two years of spending the majority of my work time on doing that to even have my head mostly around most of the relevant arguments. Having done that, my p(doom) is still roughly 50%, with wide uncertainty for unknown unknows still to be revealed or identified.
So if someone isn’t going to do that, I think the above summary is pretty accurate. Alignment and managing the resulting shifts in the world is not easy, but it’s not impossible. Sometimes humans do amazing things. Sometimes they do amazingly stupid things. So again, roughly 50% from this much rougher method.
Here’s some near-future fiction:
In 2027 the trend that began in 2024 with OpenAI’s o1 reasoning system has continued. The compute required to run AI is no longer negligible compared to the cost of training it. Models reason over long periods of time. Their effective context windows are massive, they update their underlying models continuously, and they break tasks down into sub-tasks to be carried out in parallel. The base LLM they are built on is two generations ahead of GPT-4.
These systems are language model agents. They are built with self-understanding and can be configured for autonomy. These constitute proto-AGI. They are artificial intelligences that can perform much but not all of the intellectual work that humans can do (although even what these AI can do, they cannot necessarily do cheaper than a human could).
In 2029 people have spent over a year working hard to improve the scaffolding around proto-AGI to make it as useful as possible. Presently, the next generation of LLM foundational model is released. Now, with some further improvements to the reasoning and learning scaffolding, this is true AGI. It can perform any intellectual task that a human could (although it’s very expensive to run at full capacity). It is better at AI research than any human. But it is not superintelligence. It is still controllable and its thoughts are still legible. So, it is put to work on AI safety research. Of course, by this point much progress has already been made on AI safety—but it seems prudent to get the AGI to look into the problem and get its go-ahead before commencing with the next training run. After a few months the AI declares it has found an acceptable safety approach. It spends some time on capabilities research then the training run for the next LLM begins.
In 2030 the next LLM is completed, and improved scaffolding is constructed. Now human-level AI is cheap, better-than-human-AI is not too expensive, and the peak capabilities of the AI are almost alien. For a brief period of time the value of human labour skyrockets, workers acting as puppets as the AI instructs them over video-call to do its bidding. This is necessary due to a major robotics shortfall. Human puppet-workers work in mines, refineries, smelters, and factories, as well as in logistics, optics, and general infrastructure. Human bottlenecks need to be addressed. This takes a few months, but the ensuing robotics explosion is rapid and massive.
2031 is the year of the robotics explosion. The robots are physically optimised for their specific tasks, coordinate perfectly with other robots, are able to sustain peak performance, do not require pay, and are controlled by cleverer-than-human minds. These are all multiplicative factors for the robots’ productivity relative to human workers. Most robots are not humanoid, but let’s say a humanoid robot would cost $x. Per $x robots in 2031 are 10,000 more productive than a human. This might sound like a ridiculously high number: one robot the equivalent of 10,000 humans? But let’s do some rough math:
Advantage | Productivity Multiplier (relative to skilled human)
Physically optimised for their specific tasks | 5
Coordinate perfectly with other robots | 10
Able to sustain peak performance | 5
Do not require pay | 2
Controlled by cleverer-than-human minds | 20
5*10*5*2*20 = 10,000
Suppose that a human can construct one robot per year (taking into account mining and all the intermediary logistics and manufacturing). With robots 10^4 times as productive as humans, each robot will construct an average of 10^4 robots per year. This is the robotics explosion. By the end of the year there will be a 10^11 robots (more precisely, an amount of robots that is cost-equivalent to 10^11 humanoid robots).
By 2032 there are 10^11 robots, each with the productivity of 10^4 skilled human workers. That is a total productivity equivalent to 10^15 skilled human workers. This is roughly 10^5 times the productivity of humanity in 2024. At this point trillions of advanced processing units have been constructed and are online. Industry expands through the Solar System. The number of robots continues to balloon. The rate of research and development accelerates rapidly. Human mind upload is achieved.
This sounds highly plausible. There are some other dangers your scenario leaves out, which I tried to explore in If we solve alignment, do we die anyway?
Here is an experiment that demonstrates the unlikelihood of one potential AI outcome.
The outcome shown to be unlikely:
Aligned ASI is achieved sometime in the next couple of decades and each person is apportioned a sizable amount of compute to do with as they wish.
The experiment:
I have made a precommitment that I will, conditional on the outcome described above occurring, simulate billions of lives for myself—each indistinguishable from the life I have lived so far. By “indistinguishable” I do not necessarily mean identical (which might be impossible or expensive). All that is necessary is that each has similar amounts of suffering, scale, detail, imminent AGI, etc. I’ll set up these simulations so that in each of these simulated lives I will be transported at 4:00 pm Dec11′24 to a virtual personal utopia. Having precommitted to simulating these worlds, I should now expect to be transported into a personal utopia in three minutes time if this future is likely. And if I am not transported into a personal utopia I should conclude that this future is unlikely.
Let’s see what happens...
It’s 4:00 pm and I didn’t get transported into utopia.
So, this outcome is unlikely.
QED
Potential weak points
I do see a couple of potential weak points in the logic of this experiment. Firstly, it might be the case that I’ll have reason to simulate many indistinguishable lives in which I do not get transported to utopia, which would throw off the math. But I can’t see why I’d choose to create simulations of myself in not optimally-enjoyable lives unless I had good reason to, so I don’t think that objection holds.[1]
The other potential weak point is that perhaps I wouldn’t be willing to pay the opportunity cost of billions of years of personal utopia. Although billions of years of simulation is just a tiny proportion of my compute budget, it’s still billions of years that could otherwise have been spent in perfect virtual utopia. I think this potentially a serious issue with the argument, although I will note that I don’t actually have to simulate an entire life for the experiment to work, just a few minutes around 4:00pm on Dec11′24, minutes which were vaguely enjoyable. To address this objection the experiment could be carried out while euphoric (since the opportunity cost would then be lower).
Perhaps, as a prank response to this post, someone could use some of their compute budget to simulate lives in which I don’t get transported to utopia. But I think that there would be restrictions in place against running other people as anything other than p-zombies.