This is a polemic to the ten arguments post. I’m not a regular LW poster, but I’m an AI researcher and mild-AI-worrier.
I believe that AI progress, and the risks associated with it, is one of the most important things to figure out as humanity in the current year. And yet, in most discussions about x-risk, I find myself unaligned with either side.
My overall thesis about AI x-risk is that it’s absolutely real, but also far enough into the future that at this moment, we should simply continue progress on both capabilities and safety. I’m not trying to argue that sufficiently powerful AI could never pose an x-risk, this belief seems rather silly.
Disclaimers:
This is largely thinking out loud, describing why I personally disagree (or agree) with the listed arguments. In the best case scenario, maybe I’ll convince someone, or someone will convince me—I’d hate to be wrong on this and actively contribute to our doom.
In some cases, I could go a few steps ahead in the discussion, provide a rebuttal to my arguments and a re-rebuttal to that. I’m consciously not doing that for several reasons which are hopefully intuitive and not important enough to list here.
I’m deliberately only commenting on the summaries, and not the entire body of work behind each summary, mainly to keep things tractable. If some piece of text convinces humanity about the seriousness of x-risk, it won’t be an enormous Harry Potter fanfiction (no offense). I like brevity.
Competent non-aligned agents
Humans will build AI systems that are ‘agents’, i.e. they will autonomously pursue goals
Humans won’t figure out how to make systems with goals that are compatible with human welfare and realizing human values
Such systems will be built or selected to be highly competent, and so gain the power to achieve their goals
Thus the future will be primarily controlled by AIs, who will direct it in ways that are at odds with long-run human welfare or the realization of human values
I agree that we will build agents—we already try to do that.
We only need them to be as aligned as they are powerful. A chatbot that doesn’t understand actions or function calling won’t be an x-risk, no matter how violently misaligned it is.
It does seem natural that we will favor competent systems. We will also favor aligned systems, at surface-aligned—in fact, the usefulness of a system is directly correlated with both the capabilities and the alignment.
There’s a huge leap from the previous point, to stating that the “the future will be primarily controlled by AIs”. I don’t even necessarily disagree in principle—but we’re nowhere near the level of capabilities that can lead to future-controlling AI.
Overall this is the core x-risk argument that I completely agree with—but I think it’s unlikely we get there in the foreseeable future, with the current paradigms.
Second species argument
Human dominance over other animal species is primarily due to humans having superior cognitive and coordination abilities
Therefore if another ‘species’ appears with abilities superior to those of humans, that species will become dominant over humans in the same way
AI will essentially be a ‘species’ with superior abilities to humans
Therefore AI will dominate humans
I fundamentally agree that intelligence is the main factor giving humanity its dominance over other species, and over (some of) our planet in general.
Possibly, but not necessarily. Through natural selection, humanity is wired to optimize for survival and reproduction. This gives us certain incentives (world domination). On the other hand, AIs that we create won’t necessarily have the same incentives.
& 4. This feels like a very vibe-based argument. We use an intuition that AI is like a species, and a superior species would dominate us, therefore AI would dominate us.
Not a fan of this argument. Might be effective as an intuition pump if someone can’t even conceive of how a powerful AI could lead to x-risk, but I don’t take it too seriously.
Loss of control via inferiority
AI systems will become much more competent than humans at decision-making
Thus most decisions will probably be allocated to AI systems
If AI systems make most decisions, humans will lose control of the future
If humans have no control of the future, the future will probably be bad for humans
Sure—at some point in the future, maybe.
Maybe, maybe not. Humans tend to have a bit of an ego when it comes to letting a filthy machine make decisions for them. But I’ll bite the bullet.
There’s several levels on which I disagree here. Firstly, we’re assuming that “humans” have control of the future in the first place. It’s hard to assign coherent agency to humanity as a whole, it’s more of a weird mess of conflicting incentives, and nobody really controls it. Secondly, if those AI systems are designed in the right way, the might just become the tools for humanity to sorta steer the future the way we want it.
This is largely irrelevant after my previous response—we don’t really control the future, plus I don’t buy the implication contained in this point.
This largely sounds like a rehash of the previous argument. AI will become more powerful, we can’t control it, we’re screwed. The argument has a different coat of paint, so my response is different, but ultimately the point is that an AI will take over the world with us as an under-species.
Loss of control via speed
Advances in AI will produce very rapid changes, in available AI technology, other technologies, and society
Faster changes reduce the ability for humans to exert meaningful control over events, because they need time to make non-random choices
The pace of relevant events could become so fast as to allow for negligible relevant human choice
If humans are not ongoingly involved in choosing the future, the future is likely to be bad by human lights
“Very rapid” is very vague. So far, there are many barriers to mass adoption of the existing AI technologies. Could it accelerate? Sure. But pretty much all of technological progress is exponential anyways.
This seems very general, but I’ll take it—if things happen faster, they’re harder to affect or control.
I’m not sure what time scale we’re talking about here. We could imagine the hypothetical superintelligent AGI that transcends within minutes of deployment and takes over the world, but this is extraordinarily unlikely. We could also see “a few months” as very fast (e.g. for legislation), but many things in our own decision making can be accelerated then.
We are always involved in choosing the future, if nothing else, then by designing the AI that would go zoom.
In one sense, this argument is obviously true—if we get an AI that’s superintelligent, super-quickly, and misaligned, then we’re probably screwed because we won’t react in time. But it’s a spectrum, and the real x-risk is only on the extreme end of the spectrum.
Human non-alignment
People who broadly agree on good outcomes within the current world may, given much more power, choose outcomes that others would consider catastrophic
AI may empower some humans or human groups to bring about futures closer to what they would choose
From 1, that may be catastrophic according to the values of most other humans
Agreed—pretty much any moral system, taken to the extreme, will end up in some sort of totalitarian absurdity.
Agreed—if it’s entirely in the hands of one individual/group that doesn’t have restraint in their desire for control.
Agreed.
Here we go, full agreement. This really is an issue with any sufficiently powerful technology. If only one person/country had nukes, we’d probably be worse off than in the current multipolar situation. Can the same multipolar approach help in the specific case of AI? Maybe—that’s why I tend to favor open-source approaches, at least as of 2024 with the current state of capabilities. So far, for other technologies, we’re somehow handling things through governance, so we should keep doing this with AI—and everything else.
Catastrophic tools
There appear to be non-AI technologies that would pose a risk to humanity if developed
AI will markedly increase the speed of development of harmful non-AI technologies
AI will markedly increase the breadth of access to harmful non-AI technologies
Therefore AI development poses an existential risk to humanity
It will increase the speed of development of all technologies
It will increase the breadth of access to all technologies
It can pose a risk, or it can save us. Same as most technologies.
Most technologies have good and bad uses. AI will be a force multiplier, but if we can’t handle Nukes 2.0 obtained via AI, we probably can’t handle Nukes 2.0 obtained via good ol’ human effort. This is fundamentally an argument against technological progress in general, which could be a significantly larger argument. My overall stance is that technological progress is generally good.
Powerful black boxes
So far, humans have developed technology largely through understanding relevant mechanisms
AI systems developed in 2024 are created via repeatedly modifying random systems in the direction of desired behaviors, rather than being manually built, so the mechanisms the systems themselves ultimately use are not understood by human developers
Systems whose mechanisms are not understood are more likely to produce undesired consequences than well-understood systems
If such systems are powerful, then the scale of undesired consequences may be catastrophic
Sure, that’s often the case.
Yes and no. I personally hate the “do we understand NNs” debacle, because it entirely depends on what we mean by “understand”. We can’t convert an LLM into a decision tree (much to the annoyance of people who still insist that it’s all “if” statements). At the same time, there is a lot of research into interpreting of transformers and NNs in general. It’s not inherently impossible for these systems to be interpretable.
Maybe a bit, but I suspect this is largely orthogonal. We don’t need to understand how each atom in a gas behaves to put it in an engine and produce work. If we understand it at a high level, that’s probably enough—and we have a lot of high-level interpretability research.
Take a small problem with a small model, multiply it by a lot for a powerful model, and you’ll get an x-risk. Or at least a higher likelihood of x-risk.
I was always a little bit anti-interpretability. Sure, it’s better if a model is more interpretable than less interpretable, but at the same time, we don’t need it to be fully interpretable to be powerful and aligned.
The core argument here seems to be that if the black boxes remain forever pitch black, and we multiply their potential side effects by a gazillion (in the limit of a “powerful” AI), then the consequences will be terrible. Which… sure, I guess. If it actually remains entirely inscrutable, and it becomes super powerful, then bad outcomes are more likely. But not by much in my opinion.
Multi-agent dynamics
Competition can produce outcomes undesirable to all parties, through selection pressure for the success of any behavior that survives well, or through high stakes situations where well-meaning actors’ best strategies are risky to all (as with nuclear weapons in the 20th Century)
AI will increase the intensity of relevant competitions
Tragedy of the commons, no disagreements here.
Sure.
This feels like a fairly generic “force multiplier” argument. AI, just like any technology, will amplify everything that humans do. So if you take any human-caused risk, you can amplify it in the “powerful AI” limit to infinity and get an x-risk.
This goes back to technological progress in general. The same argument can be made for electricity, so while I agree in principle that it’s a risk, it’s not an extraordinary risk.
Large impacts
AI development will have very large impacts, relative to the scale of human society
Large impacts generally raise the chance of large risks
As an AI optimist, I certainly hope so.
Generally, sure.
Once again, take AI as a mysterious multiplier for everything that we do. Take something bad that we (may) do, multiply it by AI, and you get an x-risk.
Expert opinion
The people best placed to judge the extent of existential risk from AI are AI researchers, forecasting experts, experts on AI risk, relevant social scientists, and some others
Median members of these groups frequently put substantial credence (e.g. 0.4% to 5%) on human extinction or similar disempowerment from AI
Hey, that’s me!
Laypeople should definitely consider expert opinion on things that they themselves are not that familiar with. So I agree that people generally should be aware of the risks, maybe even a bit worried. That being said, it’s not an argument that should significantly convince people who know enough to form their own informed opinions—something something appeal to authority.
Closing thoughts
I liked that post. It’s a coherent summary of the main AI x-risks that I can address. I largely agree with them in principle, but I’m still not convinced that any threat is imminent. Most discussions that I tried to have in the past usually started from step zero (“Let me explain why AI could even be a risk”), which is just boring and unproductive. Perhaps this will lead to something beyond that.
Ten counter-arguments that AI is (not) an existential risk (for now)
This is a polemic to the ten arguments post. I’m not a regular LW poster, but I’m an AI researcher and mild-AI-worrier.
I believe that AI progress, and the risks associated with it, is one of the most important things to figure out as humanity in the current year. And yet, in most discussions about x-risk, I find myself unaligned with either side.
My overall thesis about AI x-risk is that it’s absolutely real, but also far enough into the future that at this moment, we should simply continue progress on both capabilities and safety. I’m not trying to argue that sufficiently powerful AI could never pose an x-risk, this belief seems rather silly.
Disclaimers:
This is largely thinking out loud, describing why I personally disagree (or agree) with the listed arguments. In the best case scenario, maybe I’ll convince someone, or someone will convince me—I’d hate to be wrong on this and actively contribute to our doom.
In some cases, I could go a few steps ahead in the discussion, provide a rebuttal to my arguments and a re-rebuttal to that. I’m consciously not doing that for several reasons which are hopefully intuitive and not important enough to list here.
I’m deliberately only commenting on the summaries, and not the entire body of work behind each summary, mainly to keep things tractable. If some piece of text convinces humanity about the seriousness of x-risk, it won’t be an enormous Harry Potter fanfiction (no offense). I like brevity.
Competent non-aligned agents
I agree that we will build agents—we already try to do that.
We only need them to be as aligned as they are powerful. A chatbot that doesn’t understand actions or function calling won’t be an x-risk, no matter how violently misaligned it is.
It does seem natural that we will favor competent systems. We will also favor aligned systems, at surface-aligned—in fact, the usefulness of a system is directly correlated with both the capabilities and the alignment.
There’s a huge leap from the previous point, to stating that the “the future will be primarily controlled by AIs”. I don’t even necessarily disagree in principle—but we’re nowhere near the level of capabilities that can lead to future-controlling AI.
Overall this is the core x-risk argument that I completely agree with—but I think it’s unlikely we get there in the foreseeable future, with the current paradigms.
Second species argument
I fundamentally agree that intelligence is the main factor giving humanity its dominance over other species, and over (some of) our planet in general.
Possibly, but not necessarily. Through natural selection, humanity is wired to optimize for survival and reproduction. This gives us certain incentives (world domination). On the other hand, AIs that we create won’t necessarily have the same incentives.
& 4. This feels like a very vibe-based argument. We use an intuition that AI is like a species, and a superior species would dominate us, therefore AI would dominate us.
Not a fan of this argument. Might be effective as an intuition pump if someone can’t even conceive of how a powerful AI could lead to x-risk, but I don’t take it too seriously.
Loss of control via inferiority
Sure—at some point in the future, maybe.
Maybe, maybe not. Humans tend to have a bit of an ego when it comes to letting a filthy machine make decisions for them. But I’ll bite the bullet.
There’s several levels on which I disagree here. Firstly, we’re assuming that “humans” have control of the future in the first place. It’s hard to assign coherent agency to humanity as a whole, it’s more of a weird mess of conflicting incentives, and nobody really controls it. Secondly, if those AI systems are designed in the right way, the might just become the tools for humanity to sorta steer the future the way we want it.
This is largely irrelevant after my previous response—we don’t really control the future, plus I don’t buy the implication contained in this point.
This largely sounds like a rehash of the previous argument. AI will become more powerful, we can’t control it, we’re screwed. The argument has a different coat of paint, so my response is different, but ultimately the point is that an AI will take over the world with us as an under-species.
Loss of control via speed
“Very rapid” is very vague. So far, there are many barriers to mass adoption of the existing AI technologies. Could it accelerate? Sure. But pretty much all of technological progress is exponential anyways.
This seems very general, but I’ll take it—if things happen faster, they’re harder to affect or control.
I’m not sure what time scale we’re talking about here. We could imagine the hypothetical superintelligent AGI that transcends within minutes of deployment and takes over the world, but this is extraordinarily unlikely. We could also see “a few months” as very fast (e.g. for legislation), but many things in our own decision making can be accelerated then.
We are always involved in choosing the future, if nothing else, then by designing the AI that would go zoom.
In one sense, this argument is obviously true—if we get an AI that’s superintelligent, super-quickly, and misaligned, then we’re probably screwed because we won’t react in time. But it’s a spectrum, and the real x-risk is only on the extreme end of the spectrum.
Human non-alignment
Agreed—pretty much any moral system, taken to the extreme, will end up in some sort of totalitarian absurdity.
Agreed—if it’s entirely in the hands of one individual/group that doesn’t have restraint in their desire for control.
Agreed.
Here we go, full agreement. This really is an issue with any sufficiently powerful technology. If only one person/country had nukes, we’d probably be worse off than in the current multipolar situation. Can the same multipolar approach help in the specific case of AI? Maybe—that’s why I tend to favor open-source approaches, at least as of 2024 with the current state of capabilities. So far, for other technologies, we’re somehow handling things through governance, so we should keep doing this with AI—and everything else.
Catastrophic tools
Nukes, MassiveCO2EmitterMachine9000, FalseVacuumDecayer, sure
It will increase the speed of development of all technologies
It will increase the breadth of access to all technologies
It can pose a risk, or it can save us. Same as most technologies.
Most technologies have good and bad uses. AI will be a force multiplier, but if we can’t handle Nukes 2.0 obtained via AI, we probably can’t handle Nukes 2.0 obtained via good ol’ human effort. This is fundamentally an argument against technological progress in general, which could be a significantly larger argument. My overall stance is that technological progress is generally good.
Powerful black boxes
Sure, that’s often the case.
Yes and no. I personally hate the “do we understand NNs” debacle, because it entirely depends on what we mean by “understand”. We can’t convert an LLM into a decision tree (much to the annoyance of people who still insist that it’s all “if” statements). At the same time, there is a lot of research into interpreting of transformers and NNs in general. It’s not inherently impossible for these systems to be interpretable.
Maybe a bit, but I suspect this is largely orthogonal. We don’t need to understand how each atom in a gas behaves to put it in an engine and produce work. If we understand it at a high level, that’s probably enough—and we have a lot of high-level interpretability research.
Take a small problem with a small model, multiply it by a lot for a powerful model, and you’ll get an x-risk. Or at least a higher likelihood of x-risk.
I was always a little bit anti-interpretability. Sure, it’s better if a model is more interpretable than less interpretable, but at the same time, we don’t need it to be fully interpretable to be powerful and aligned.
The core argument here seems to be that if the black boxes remain forever pitch black, and we multiply their potential side effects by a gazillion (in the limit of a “powerful” AI), then the consequences will be terrible. Which… sure, I guess. If it actually remains entirely inscrutable, and it becomes super powerful, then bad outcomes are more likely. But not by much in my opinion.
Multi-agent dynamics
Tragedy of the commons, no disagreements here.
Sure.
This feels like a fairly generic “force multiplier” argument. AI, just like any technology, will amplify everything that humans do. So if you take any human-caused risk, you can amplify it in the “powerful AI” limit to infinity and get an x-risk.
This goes back to technological progress in general. The same argument can be made for electricity, so while I agree in principle that it’s a risk, it’s not an extraordinary risk.
Large impacts
As an AI optimist, I certainly hope so.
Generally, sure.
Once again, take AI as a mysterious multiplier for everything that we do. Take something bad that we (may) do, multiply it by AI, and you get an x-risk.
Expert opinion
Hey, that’s me!
Laypeople should definitely consider expert opinion on things that they themselves are not that familiar with. So I agree that people generally should be aware of the risks, maybe even a bit worried. That being said, it’s not an argument that should significantly convince people who know enough to form their own informed opinions—something something appeal to authority.
Closing thoughts
I liked that post. It’s a coherent summary of the main AI x-risks that I can address. I largely agree with them in principle, but I’m still not convinced that any threat is imminent. Most discussions that I tried to have in the past usually started from step zero (“Let me explain why AI could even be a risk”), which is just boring and unproductive. Perhaps this will lead to something beyond that.