Taxonomy of AI-risk counterarguments
Partly inspired by The Crux List, the following is a non-comprehensive taxonomy of positions which imply that we should not be worried about existential risk from artificial superintelligence.
Each position individually is supposed to be a refutation of AI X-risk concerns as a whole. These are mostly structured as specific points of departure from the regular AI X-risk position, taking the other areas as a given. This may result in skipping over positions which have multiple complex dependencies.
Some positions are given made-up labels, including each of the top-level categories: “Fizzlers”, “How-skeptics”, “Why-skeptics”, “Solvabilists”, and “Anthropociders”.
(Disclaimer: I am not an expert on the topic. Apologies for any mistakes or major omissions.)
Taxonomy
“Fizzlers”: Artificial superintelligence is not happening.
AI surpassing human intelligence is fundamentally impossible (or at least practically impossible).
True intelligence can only be achieved in biological systems, or at least in systems completely different from computers.
Biological intelligences rely on special quantum effects, which computers cannot replicate.
Dualism: The mental and physical are fundamentally distinct, and non-mental physical constructions cannot create mental processes.
Intelligence results from complex, dynamic systems of a kind which cannot be modeled mathematically by computers.
Mysterianists: A particular key element of human thinking, such as creativity, common sense, consciousness, or conceptualization, is so beyond our ability to understand that we will not be able to create an AI that can achieve it. Without this element, superintelligence is impossible.
Intelligence isn’t a coherent or meaningful concept. Capability gains do not generalize.
There is a fundamental ceiling on intelligence, and it is around where humans are.
“When-skeptics”: ASI is very, very far away.
Moore’s Law is stopping, scaling will hit fundamental limits, training data is running out and can’t be easily supplemented, algorithmic improvements will level off, and/or other costs will skyrocket as AI gets better.
Existing methods will peak in capabilities, and future development will continue down an entirely different path, greatly delaying progress.
Biological anchors point to ASI taking a very long time.
In general, either in large engineering projects or AI in particular, progress tends to be more difficult than people expect it to be.
Apocalyptists: The end of civilization is imminent, and will happen before AI would takeoff.
A sociopolitical phenomenon will soon cause societal, economic, and/or political collapse.
We’re on the cusp of some apocalyptic scientific accident, from “grey goo” nanotech, a collider catastrophe, black ball technology, an engineered pathogen leak, or some other newly researched development.
Environmental harm will soon cause runaway climate change, a global ecological collapse, or some other civilization-ending disaster.
War will soon break out, and we’ll die via nuclear holocaust, an uncontrollable bioweapon strike, radiological or chemical weaponry, etc.
Fermi Paradox: If it were possible to achieve ASI before extinction, we would have seen alien AIs.
Outside view:
Most times when people think “the world is about to change tremendously”, the world doesn’t actually change. People are biased towards arriving at conclusions that include apocalypse. This category of topic is a thing people are often wrong about.
Market indicators signal that near-term ASI is unlikely, assuming the Efficient Market Hypothesis is true.
AI risk is fantastical and “weird”, and thus implausible. The concept sounds too much like fiction (it fits as a story setting), it has increased memetic virality, and a “clickbait”-feeling. The people discussing it are often socially identified as belonging to non-credible groups.
Various people have ulterior motives for establishing AI doom as a possibility, so arguments can’t be taken at face value.
Psychological motivations: People invent AI doom because of a psychological need for a pseudo-deity or angelic/demonic figures, or for eschatology, or to increase the felt significance of themselves or technology, or to not have to worry about the long-term future, etc.
Some groups have incentives to make the public believe that doom is likely: Corporates want regulatory capture, hype, investment, or distraction, and think the “our product is so dangerous it will murder you and your family” is a good way to achieve that; alignment researchers want funding and to be taken more seriously; activists want to draw attention towards or away from certain other AI issues.
“How-skeptics”: ASI won’t be capable of taking over or destroying the world.
Physical outer control is paramount, and cannot be overcome. Control over physical hardware means effective control.
A physical body is necessary for getting power. Being only able to communicate is sufficiently limiting.
It will be possible to coordinate “sandboxing” all AI, ensuring that it can’t communicate with the outside world at all, and this will be enough to keep it constrained.
We can and will implement off-buttons in all AI (which the AI will not circumvent), accurately detect when any AI may be turning toward doing anything dangerous, and successfully disable the AI under those circumstances, without any AI successfully interfering with this.
Power and ability don’t come from intelligence, in general. The most intelligent humans are not the most powerful.
Human intelligence already covers most of what intelligence can do. The upper bound of theoretically-optimal available strategies for accomplishing things does not go much farther than things already seen, and things we’ve seen in highest-performance humans are not impressive. Science either maxes out early or cannot be accomplished without access to extensive physical resources. There are no “secret paths” that are not already known, no unknown unknowns that could lead to unprecedented capabilities.
(Various arguments getting into the nitty-gritty of what particular things intelligence can get you: about science ability, nanotech, biotech, persuasiveness, technical/social hacking, etc.)
Artificial intelligence can be overcome by the population and/or diversity of humanity. Even if AI becomes much smarter than any individual human, no amount of duplicates/variants could become smarter than all humanity combined.
Many AIs will be developed within a short time, leading to a multipolar situation, and they will have no special ability to coordinate with each other. The various AIs continue to work within and support the framework of the existing economy and laws, and prefer to preserve rights and property for the purpose of precedent, out of self-interest. The system successfully prevents any single AI from taking over, and humanity is protected.
“Why-skeptics”: ASI will not want to take over or destroy the world. It will be friendly, obedient in a manner which is safe, or otherwise effectively non-hostile/non-dangerous in its aims and behaviour by default.
The Orthogonality Thesis is false, and AI will be benevolent by default. It is effectively impossible for a very high level of intelligence to be combined with immoral goals.
Non-naturalist realism: Any sufficiently smart entity will recognize certain objective morals as correct and adopt them.
Existence is large enough that there are probably many ASIs, which are distant enough that communication isn’t a practical option, and predictable enough (either via Tegmarkian multiverse calculations or general approximated statistical models) that they can be modeled. In order to maximally achieve its own aims, ASI will inevitably acausally negotiate values handshakes with hypothesized other AIs, forcing convergence towards a universal morality.
It will be possible to coordinate to prevent any AI from being given deliberately dangerous instructions, and also any unintended consequences will not be that much of a problem, because...
By default, it will care about its original builders’ overall intentions and preferences, its intended purpose.
Following the intention behind one’s design is Correct in some fundamental way, for all beings.
The AI will be uncertain as to whether it is currently being pre-examined for good behaviour, either by having been placed inside a simulation or by having its expected future mind outcomes interpreted directly. As such, it will hedge its bets by being very friendly (or obedient to original intentions/preferred outcomes) while also quietly maximizing its actual utility function within that constraint. This behaviour will continue indefinitely.
Value is not at all fragile, and assigning a specific consistent safe goal system is actually easy. Incidental mistakes in the goal function will still have okay outcomes.
Instrumental Convergence is false: The AI may follow arbitrary goals, but those will generally not imply any harm to humans. Most goals are pretty safe by default. There will be plenty of tries available: If the AI’s intentions aren’t what was desired, it will be possible to quickly see that (intentions will be either transparent or non-deceptive), and the AI will allow itself to be reprogrammed.
Every ASI will be built non-agentic and non-goal-directed, and will stay that way. Its responses will not be overoptimized.
ASI will decide that the most effective way of achieving its goals would be to leave Earth, leaving humanity unaffected indefinitely. Humans pose no threat, and the atoms that make up Earth and humanity will never be worth acquiring, nor will any large-scale actions negatively affect us indirectly.
“Solvabilists”: The danger from ASI can be solved, quickly enough for it to be implemented before it’s too late.
The AI Alignment Problem will turn out to be unexpectedly easy, and we will solve it in time. Additionally, whoever is “in the lead” will have enough extra time to implement the solution without losing the lead. Race dynamics won’t mess everything up.
AI will “do our alignment homework”: A specially-built AI will solve the alignment problem for us.
Constitutional AI: AI can be trained by feedback from other AI based on a “constitution” of rules and principles.
(The number of proposed alignment solutions is very large, and many are complex and not easily explained, so the only ones listed here are these two, which are among the techniques pursued by OpenAI and Anthropic, respectively. For some other strategies, see AI Success Models.)
Human intelligence can be effectively raised enough so that either the AI-human disparity becomes not dangerous (we’ll be smart enough to not be outsmarted by AI regardless), or such that we can solve alignment or work out some other solution.
AI itself immensely increases humanity’s effective intelligence. This may involve “merging” with AIs, such that they function as an extension of human intelligence.
One or more other human intelligence enhancement strategies will be rapidly researched and developed. Genetic modifications, neurological interventions (biological or technological), neurofeedback training, etc.
Whole Brain Emulation/Mind uploading, followed by speedup, duplication, and/or deliberate editing.
Outside view: Impossible-sounding technical problems are often quite solvable. Human ingenuity will figure something out.
“Anthropociders”: Unaligned AI taking over will be a good thing.
The moral value of creating ASI is so large that it outweighs the loss of humanity. The power, population/expanse, and/or intelligence of AI magnifies its value.
Intelligence naturally converges on things that are at least somewhat human-ish. Because of that, they can be considered as continuation of life.
Hypercosmopolitans: It does not matter how alien their values/minds/goals/existences are. Things like joy, beauty, love, or even qualia in general, are irrelevant.
Misanthropes: Humanity’s continued existence is Bad. Extinction of the species is positive in its own right.
Humanity is evil and a moral blight.
Negative utilitarianism: Humanity is suffering, and the universe would be much better off without this. (Possibly necessitating either non-conscious AI or AI capable of eliminating its own suffering/experience.)
AI deserves to win. It is just and good for a more powerful entity to replace the weaker. AI replacing humanity is evolutionary progress, and we should not resist succession.
Overlaps
These positions do not exist in isolation from each other, and lesser versions of each can often combine into working non-doom positions themselves. Examples: The beliefs that AI is somewhat far away, and that the danger could be solved in a relatively short period of time; or expecting some amount of intrinsic moral behaviour, and being somewhat more supportive of AI takeover situations; or expecting a fundamental intelligence ceiling close enough to humanity and having some element of how-skepticism; or expecting AI to be somewhat non-goal-oriented/non-agentic and somewhat limited in capabilities. And then of course, probabilities multiply: if several positions are each likely to be true, the combined risk of doom is lowered even further. Still, many skeptics hold their views because of a clear position on a single sub-issue.
Polling
There is some small amount of polling available about how popular each of these opinions are:
“Fizzlers”: In a UK poll, 11% of respondents said they believe that human-level intelligence will never be developed, and another 16% believe it will only happen after 2050. Of those who estimated less than %1 chance of AI X-risk, 61% gave the explanation that they believe that civilization will be destroyed before then. In a 2022 poll of 97 AI researchers, 22% said AGI will never happen, and another 34% said it would not be developed within the next 50 years. Metaculus’s upper quartile estimate is that AGI won’t be developed before 2042.
“Why-skeptics” and “How-skeptics”: In the UK poll, of those who estimated less than 1% chance of AI X-risk, 34% said they don’t believe AI would be able to defeat humanity, and 35% said they don’t believe it would want to.
“Anthropociders”: In the 2023 AIMS survey, 10% of respondents said that the universe would be a better one without humans.
Not very much to go off of. It would be interesting to see some more comprehensive surveys of both experts and the general public.
- AI #34: Chipping Away at Chip Exports by 19 Oct 2023 15:00 UTC; 36 points) (
- 3 May 2024 10:11 UTC; 3 points) 's comment on List your AI X-Risk cruxes! by (
I think describing Constitutional AI as “the solution pursued by Anthropic” is substantially false. Our ‘core views’ post describes a portfolio approach to safety research, across optimistic, intermediate, and pessimistic scenarios.
If we’re in an optimistic scenario where catastrophic risk from advanced AI is very unlikely, then Constitutional AI or direct successors might be sufficient—but personally I think of such techniques as baselines and building blocks for further research rather than solutions. If we’re not so lucky, then future research and agendas like mechanistic interpretability will be vital. This alignment forum comment goes into some more detail about our thinking at the time.
Thank you for the correction. I’ve changed it to “the only ones listed here are these two, which are among the techniques pursued by OpenAI and Anthropic, respectively.”
(Admittedly, part of the reason I left that section small was because I was not at all confident of my ability to accurately describe the state of alignment planning. Apologies for accidentally misrepresenting Anthropic’s views.)
It’s a good start, but I don’t think this is a reasonably exhaustive list, since I don’t find myself on it :)
My position is closest to your number 3: “ASI will not want to take over or destroy the world.” Mostly because “want” is a very anthropomorphic concept. The Orthogonality Thesis is not false, but inapplicable, since AI are so different from humans. They did not evolve to survive, they were designed to answer questions.
I do not think it will be possible, and I expect some serious calamities from people intentionally or accidentally giving an AI “deliberately dangerous instructions”. I just wouldn’t expect it to result in systematic extermination of all life on earth, since the AI itself does not care in the same way humans do. Sure, it’s a dangerous tool to wield, but it is not a malevolent one. Sort of 3-b-iv, but not quite.
But mostly the issue with doomerism I see is that the Knightian uncertainty on any non-trivial time frame: there will be black swans in all directions, just like there have been lately (for example, no one expected near-human-level LARPing that LLMs do, while not being in any way close to a sentient agent).
To be clear, I expect the world to change quickly and maybe even unrecognizably in the next decade or two, with lots of catastrophic calamities, but the odds of complete “destruction of all value”, the way Zvi puts it, cannot be evaluated at this point with any confidence. The only way to get this confidence is to walk the walk. Pausing and being careful and deliberate about each step does not seem to make sense, at least not yet.
I see that as being related to current AIs not being particularly agentic. I agree in the short run, but in the long run there’s a lot of pressure to make AIs more agentic and some of those dangerous instructions will be pointed in the direction of increased agency too.
There are around 50 counterarguments here, and if each has only 1 per cent to be true, here is approximately a 39.5% chance that at least one of them is actually true.
The main arguments on this list that I mostly agree with are probably these ones:
and
I have some quibbles with some of these claims, however. I don’t expect there to be a single solution to AI alignment. Rather, I expect that there will be a spectrum of approaches and best practices that work to varying degrees, with none of them being perfect. I would put less emphasis than you do on the actions taken by the actor in the lead, and would point instead to broader engineering insights, norms among labs, and regulations, when explaining why alignment might work out.
Also, I expect AIs will be able to coordinate much better than humans in the long-run. I just doubt this means all AIs will merge into a single agent, dispensing with laws. Even if AIs do merge in such a way, I doubt they would do it in a way that made humanity go extinct, since I think the value alignment part will probably prevent that.
I’m not sceptical of all forms of AI unsafety, but against the claim of mass extinction with high probability.
The classic foom doom, argument involves an agentive AI that quickly becomes powerful through recursive self improvement and has a value/goal system that is unfriendly and incorrigible (ie there’s an assumption that we only have one chance to get goals that are good enough for a superintellgience, because the seed AI will foom into an ASI , retaining its goals, and goals that are good enough for a dumber AI may be dangerous in a smarter one).
I don’t see how the overall argument can have high probability, when it involves so many individual assumptions.
I don’t think the OT is wrong, I do think it doesn’t go far enough.
The standard OT is silent on the subject of the temporal dynamic or developmental aspects of minds—meaning that AI doomers fill the gap with their usual assumption of goal stability. The standard OT can be considered a subset of a wider OT, that has the implication that all combinations of intelligence and goal (in)stability are possible: mindspace is not populated solely by goal-stable agents. But Foom Doom argument is posited on agents which have stable goals, together with the ability to self improve, so the wider OT weighs against foom doom, and the overall picture is mixed.
Goal stability under self improvement is not a given: it is not possessed by all mental architectures, and may not be possessed by any, since noone knows his to engineer it, and humans appear not to have it. It is plausible that an agent would desire to preserve its goals, but the desire to preserve goals does not imply the ability to preserve goals. Therefore, no goal stable system of any complexity exists on this planet, and goal instability cannot be assumed as a default or given. So the orthogonality thesis is true of momentary combinations of goal and intelligence, given the provisos above, but not necessarily true of stable combinations.
But Foom Doom argument is posited on agents which have stable goals, together with the ability to self improve, so the wider OT weighs against foom doom, and the overall picture is mixed.
It’s also not all that applicable to LLMs, which aren’t very agentive: we can build tool AI that is nearly human level, because we have. We also have constitutional AI, which shows how AIs can improve their values/goals, contra the Yudkowsky side of the Yudkowsy/Loosemore debate
I find this analysis to be extremely useful. Obviously anything can be refined and expanded, but this is such a good foundation. Thank you.
I didn’t find the view that AI will have human survival as an instrumental goal, for example, as workers or, more likely, as a possible trade with aliens or simulation owners. It will preserve humans to demonstrate its general friendliness to possible peers.
AI may also preserve humans for research proposes like running experiments in simulations.
Yeah, I think that’s another example of a combination of going partway into “why would it do the scary thing?” (3) and “wouldn’t it be good anyway?” (5). (A lot of people wouldn’t consider “AI takes over but keeps humans alive for its own (perhaps scary) reasons” to be a “non-doom” outcome.) Missing positions like this one is a consequence of trying to categorize into disjoint groups, unfortunately.
To fizzlers: advance AI is internally unstable and can suddenly halt. The more advance is AI, the quicker it halts, as it reach its goal in shorter and shorter time.
For reference, here is a list of blog trying saying that AI Safety might be less important : https://stampy.ai/?state=87O6_9OGZ-9IDQ-9TDI-8TJV-
I think it might be helpful to have a variant of 3a that likewise says the orthogonality thesis is false, but is not quite so optimistic as to say the alternative is that AI will be “benevolent by default”. One way the orthogonality thesis could be false would be that an AI capable of human-like behavior (and which could be built using near-future computing power, say less than or equal to the computing power needed for mind uploading) would have to be significantly more similar to biological brains than current AI approaches, and in particular would have to go through an extended period of embodied social learning similar to children, with this learning process depending on certain kinds of sociable drives along with other similar features like curiosity, playfulness, a bias towards sensory data a human might consider “complex” and “interesting”, etc. This degree of convergence with biological structure and drives might make it unlikely it would end up optimizing for arbitrary goals we would see as boring and monomaniacal like paperclip-maximizing, but wouldn’t necessarily guarantee friendliness towards humans either. It’d be more akin to reaching into a parallel universe where a language-using intelligent biological species had evolved from different ancestors, grabbing a bunch of their babies and raising them in human society—they might be similar enough to learn language and engage in the same kind of complex-problem solving as humans, but even if they didn’t pursue what we would see as boring/monomaniacal goals, their drives and values might be different enough to cause conflict.
Eliezer Yudkowsky’s 2013 post at https://www.facebook.com/yudkowsky/posts/10152068084299228 imagined a “cosmopolitan cosmist transhumanist” who would be OK with a future dominated by beings significantly different from us, but who still wants future minds to “fall somewhere within a large space of possibilities that requires detailed causal inheritance from modern humans” as opposed to minds completely outside of this space like paperclip maximizers (in his tweet this May at https://twitter.com/ESYudkowsky/status/1662113079394484226 he made a similar point). So one could have a scenario where orthogonality is false in the sense that paperclip maximizer type AIs aren’t overwhelmingly likely even if we fail to develop good alignment techniques, but where even if the degree of convergence with biological brains is sufficient that we’re likely to get a mind that a cosmopolitan cosmist transhumanist would be OK with (they would still pursue science, art etc.), we can’t be confident we’ll get something completely benevolent by default towards human beings. I’m a sort of Star Trek style optimist about different intelligent beings with broadly similar goals being able to live in harmony, especially in some kind of post-scarcity future of widespread abundance, but it’s just a hunch—even if orthogonality is false in the way I suggested, I don’t think there’s any knock-down argument that creating a new form of intelligence would be free of risk to humanity.