Hi, I’m a skeptic of the Alignment Problem, meaning, I don’t expect TAI will be dangerously misaligned. I’ll go over some of the reasons why I’m skeptical, and I’ll suggest how you could persuade me to take the problem more seriously.
Firstly, I definitely agree with the “insurance” argument. Even if I think that TAI is unlikely to be misaligned in a disastrous way, there is still the possibility of a rouge AI causing disaster, and I 100% agree that it worth investing in solving the Alignment Problem. So I DO NOT think that alignment research should be dismissed, ignored or defunded. In fact I would agree with most people on this forum that alignment research is severely underfunded and not taken seriously enough.
I also agree that misalignment is possible. I agree with the Orthogonality Thesis. Someone could build a paperclip maximizer (for example) if they for some reason wanted to. My opinion is that institutional actors (corporate and military research institutes) will not deploy misaligned AI. They will be aware that their AI is misaligned and will not deploy it until it is aligned, even if that takes a long time, because it simply is not in their self-interest to deploy an AI which is not useful to them. Also, if misaligned AI is deployed, the amount of damage that it could do is severely limited.
Corporate actors will not deploy an AI if it is likely to result in financial, legal or public relations disaster for them. This constraint does not apply to military actors, so if world war breaks out or if the Chinese or US Military-Industrial Complex takes the lead in AI away from the current leaders, US corporate research, that could be disastrous and is a scenario I am very afraid of. However, my primary fear is not of the “Technical” Alignment Problem but of the “Political” Alignment Problem. Allow me to explain.
Intelligence is commonly defined as “the ability to achieve goals” (or something like that). When we talk about AI-induced disaster, the AI is usually imagined to be extremely powerful (in at least one dimension). In other words, the fear is not of weak AI but of Artificial Super Intelligence, and “super” is a way of saying “the AI is super powerful and therefore dangerous.” When discussing how disaster could happen, the story often goes “the AI invents grey goo nanotechnology/designs a lethal pathogen/discovers a lower vacuum energy and destroys the world.”
What’s never explained is how AI could suddenly become so powerful that it can achieve what entire nations of extremely smart people could never do. China, the Soviet Union, Nazi Germany or the US (at our worst) were never able to existentially destroy or totally dominate their enemies, despite extreme efforts to do so. Why would an AI be able to do so, and why would you expect it to happen by accident the moment AI is let out of the box?
A much scarier and much more plausible scenario involves conventional military hardware and police state tactics scaled up to an extreme. Suppose an evil dictator gets control of AGI and builds a fleet of billions of autonomous quadcoptors (drones) armed with ordinary guns. He uses the drones to spy on his people and suppress any attempt at resistance. If this dictator is not stopped by a competing military force, either because said dictator rules the sole superpower/one world government, or because the competing superpowers are also evil dictatorships, then this dictator would be able to oppress everyone and could not be opposed, possibility for the rest of eternity.
The dictator could be an ASI, but the dictator could also be a human being using a sufficiently aligned AI. Either way, we have a serious problem. Not the Technical Alignment Problem, but the Political Alignment Problem. Westerners used to believe that there was some sort of force of nature bending the moral arc towards justice, freedom and democracy as society progressed. IMHO this trend towards better society worked until the year 2001, when 9/11 caused the US to turn backwards (starting with the Patriot Act) and then more recently the rest of the world started to revert towards autocracy for some reason.
The problem is making the future safe and free for the common person. For all of human history, dictators were limited in how cruel and oppressive they could be. If the dictator was so awful that even his rank-and-file police and military would not support him, he would be toppled eventually. AI potentially takes that constraint away. An AI that is infinitely subservient (totally aligned) will follow the dictator’s orders no matter how evil he is.
The Political Alignment Problem is probably not solvable—we can only hope that free people prevail over autocracy. But enough politics, let’s go back to the Technical Alignment Problem. Why don’t I think it’s a concern? Simply put, I don’t think early AIs will be so powerful. Our world is built on millennia of legal, economic, technological and military safeguards to prevent individual actors (with the exception of autocrats) from doing too much damage. To my knowledge, the worst damage that any non-state individual actor could do was limited to about 16,000 deaths (upper estimate). Society is designed to limit the damage that bad actors and accidents can do, and there is no reason to believe that rouge AI will be dramatically more powerful than corporations or terrorists.
In summary, my objection is approximately the old response of “why wouldn’t we just unplug the damn thing” with the added point that we would be willing to use police or if necessary military force if the AI resists. To be convinced otherwise, I would need to understand why early AIs would become so much more powerful than corporations, terrorists or nation-states.
My opinion is that institutional actors (corporate and military research institutes) will not deploy misaligned AI. They will be aware that their AI is misaligned and will not deploy it until it is aligned,
Why do you think that the creators of AI will know if its misaligned? If its anything like current AI research, we are talking people applying a large amount of compute to an algorithm they have fairly limited understanding of. Once you have the AI, you can try running tests on it, but if the AI realises its being tested, it will try to trick you, acting nice in testing and only turning on you once deployed. You probably won’t know if the AI is aligned without a lot of theoretical results that don’t yet exist. And the AI’s behaviour is likely to be actively misleading.
Why do you think AI can’t cause harm until it is “deployed”. The AI is running on a computer in a research lab. The security of this computer may be anywhere from pretty good to totally absent. The AI’s level of hacking skills is also unknown. If the AI is very smart, its likely that it can hack its way out and get all over the internet with no one ever deciding to release it.
What’s never explained is how AI could suddenly become so powerful that it can achieve what entire nations of extremely smart people could never do.
How could a machine move that big rock when all the strongest people in the tribe have tried and failed? The difference between “very smart human” and the theoretical limits of intelligence may well be like the difference between “very fast cheetah” and the speed of light.
“why wouldn’t we just unplug the damn thing”
Stopping WW2 is easy, the enemy needs air right, so just don’t give them any air and they will be dead in seconds.
Reasons unplugging an AI might not be the magic solution.
You don’t know it’s misaligned. Its acting nicely so far. You don’t realize it’s plotting something.
It’s all over the internet. Millions of computers all over the world, including some on satellites.
It’s good at lying and manipulating humans. Maybe someone is making a lot of money from the AI. Maybe the AI hacked a bank and hired security guards for its datacenter. Maybe some random gullible person has been persuaded to run the AI on their gaming pc with a cute face and a sob story. If anyone in the world could download the AI’s code off the internet and run it and get superhuman advice, many people would. Almost all our communication is digital, so good luck convincing people that the AI needs to be destroyed when the internet is full of very persuasive pro AI arguments.
Its developed is own solar powered nanobot hardware.
Turning the AI works a fraction of a second after the AI is turned on. But this is useless, no one would turn an AI on and then immediately turn it off again. The person turning an unaligned AI on is likely mistaken in some way about what their AI will do. The AI will make sure not to correct that flawed conception until its too late.
I don’t know how to simply communicate to you the concept of things beyond the human level. Helpfully, Eliezer wrote a short story to communicate it, and I’ll link to that.
What do you think would happen if we built an intelligence of that relative speed-up to humanity, and connected it to the internet?
I would need to understand why early AIs would become so much more powerful than corporations, terrorists or nation-states>
One argument I removed to make it shorter was approximately: “It doesn’t have to take over the world to cause you harm”. And since early misaligned AI is more likely to appear in a developed country, your odds of being harmed by it is higher compared to someone in an undeveloped country. If ISIS suddenly found itself 500 strong in Silicon Valley and in control of Google’s servers, surely you would have the right to be concerned before they had a good chance of taking over the whole world. And you’d be doubly worried if you did not understand how it went from 0 to 500 “strong”, or what the next increase in strength might be. You understand how nation states and terrorist organizations grow. I don’t think anyone currently understands, well, how AI grows in intelligence.
There were a million other arguments I wanted to “head off” in this post, but the whole point of introductory material is to be short.
> there is no reason to believe that rouge AI will be dramatically more powerful than corporations or terrorists”
I don’t think that’s true. If our AI ends up no more powerful than existing corporations or terrorists, why are we spending billions on it? It had better be more powerful than something. I agree alignment might not be “solvable” for the reasons you mention, and I don’t claim that it is.
I am specifically claiming AI will be unusually dangerous, though.
Hi, I’m a skeptic of the Alignment Problem, meaning, I don’t expect TAI will be dangerously misaligned. I’ll go over some of the reasons why I’m skeptical, and I’ll suggest how you could persuade me to take the problem more seriously.
Firstly, I definitely agree with the “insurance” argument. Even if I think that TAI is unlikely to be misaligned in a disastrous way, there is still the possibility of a rouge AI causing disaster, and I 100% agree that it worth investing in solving the Alignment Problem. So I DO NOT think that alignment research should be dismissed, ignored or defunded. In fact I would agree with most people on this forum that alignment research is severely underfunded and not taken seriously enough.
I also agree that misalignment is possible. I agree with the Orthogonality Thesis. Someone could build a paperclip maximizer (for example) if they for some reason wanted to. My opinion is that institutional actors (corporate and military research institutes) will not deploy misaligned AI. They will be aware that their AI is misaligned and will not deploy it until it is aligned, even if that takes a long time, because it simply is not in their self-interest to deploy an AI which is not useful to them. Also, if misaligned AI is deployed, the amount of damage that it could do is severely limited.
Corporate actors will not deploy an AI if it is likely to result in financial, legal or public relations disaster for them. This constraint does not apply to military actors, so if world war breaks out or if the Chinese or US Military-Industrial Complex takes the lead in AI away from the current leaders, US corporate research, that could be disastrous and is a scenario I am very afraid of. However, my primary fear is not of the “Technical” Alignment Problem but of the “Political” Alignment Problem. Allow me to explain.
Intelligence is commonly defined as “the ability to achieve goals” (or something like that). When we talk about AI-induced disaster, the AI is usually imagined to be extremely powerful (in at least one dimension). In other words, the fear is not of weak AI but of Artificial Super Intelligence, and “super” is a way of saying “the AI is super powerful and therefore dangerous.” When discussing how disaster could happen, the story often goes “the AI invents grey goo nanotechnology/designs a lethal pathogen/discovers a lower vacuum energy and destroys the world.”
What’s never explained is how AI could suddenly become so powerful that it can achieve what entire nations of extremely smart people could never do. China, the Soviet Union, Nazi Germany or the US (at our worst) were never able to existentially destroy or totally dominate their enemies, despite extreme efforts to do so. Why would an AI be able to do so, and why would you expect it to happen by accident the moment AI is let out of the box?
A much scarier and much more plausible scenario involves conventional military hardware and police state tactics scaled up to an extreme. Suppose an evil dictator gets control of AGI and builds a fleet of billions of autonomous quadcoptors (drones) armed with ordinary guns. He uses the drones to spy on his people and suppress any attempt at resistance. If this dictator is not stopped by a competing military force, either because said dictator rules the sole superpower/one world government, or because the competing superpowers are also evil dictatorships, then this dictator would be able to oppress everyone and could not be opposed, possibility for the rest of eternity.
The dictator could be an ASI, but the dictator could also be a human being using a sufficiently aligned AI. Either way, we have a serious problem. Not the Technical Alignment Problem, but the Political Alignment Problem. Westerners used to believe that there was some sort of force of nature bending the moral arc towards justice, freedom and democracy as society progressed. IMHO this trend towards better society worked until the year 2001, when 9/11 caused the US to turn backwards (starting with the Patriot Act) and then more recently the rest of the world started to revert towards autocracy for some reason.
The problem is making the future safe and free for the common person. For all of human history, dictators were limited in how cruel and oppressive they could be. If the dictator was so awful that even his rank-and-file police and military would not support him, he would be toppled eventually. AI potentially takes that constraint away. An AI that is infinitely subservient (totally aligned) will follow the dictator’s orders no matter how evil he is.
The Political Alignment Problem is probably not solvable—we can only hope that free people prevail over autocracy. But enough politics, let’s go back to the Technical Alignment Problem. Why don’t I think it’s a concern? Simply put, I don’t think early AIs will be so powerful. Our world is built on millennia of legal, economic, technological and military safeguards to prevent individual actors (with the exception of autocrats) from doing too much damage. To my knowledge, the worst damage that any non-state individual actor could do was limited to about 16,000 deaths (upper estimate). Society is designed to limit the damage that bad actors and accidents can do, and there is no reason to believe that rouge AI will be dramatically more powerful than corporations or terrorists.
In summary, my objection is approximately the old response of “why wouldn’t we just unplug the damn thing” with the added point that we would be willing to use police or if necessary military force if the AI resists. To be convinced otherwise, I would need to understand why early AIs would become so much more powerful than corporations, terrorists or nation-states.
Why do you think that the creators of AI will know if its misaligned? If its anything like current AI research, we are talking people applying a large amount of compute to an algorithm they have fairly limited understanding of. Once you have the AI, you can try running tests on it, but if the AI realises its being tested, it will try to trick you, acting nice in testing and only turning on you once deployed. You probably won’t know if the AI is aligned without a lot of theoretical results that don’t yet exist. And the AI’s behaviour is likely to be actively misleading.
Why do you think AI can’t cause harm until it is “deployed”. The AI is running on a computer in a research lab. The security of this computer may be anywhere from pretty good to totally absent. The AI’s level of hacking skills is also unknown. If the AI is very smart, its likely that it can hack its way out and get all over the internet with no one ever deciding to release it.
How could a machine move that big rock when all the strongest people in the tribe have tried and failed? The difference between “very smart human” and the theoretical limits of intelligence may well be like the difference between “very fast cheetah” and the speed of light.
Stopping WW2 is easy, the enemy needs air right, so just don’t give them any air and they will be dead in seconds.
Reasons unplugging an AI might not be the magic solution.
You don’t know it’s misaligned. Its acting nicely so far. You don’t realize it’s plotting something.
It’s all over the internet. Millions of computers all over the world, including some on satellites.
It’s good at lying and manipulating humans. Maybe someone is making a lot of money from the AI. Maybe the AI hacked a bank and hired security guards for its datacenter. Maybe some random gullible person has been persuaded to run the AI on their gaming pc with a cute face and a sob story. If anyone in the world could download the AI’s code off the internet and run it and get superhuman advice, many people would. Almost all our communication is digital, so good luck convincing people that the AI needs to be destroyed when the internet is full of very persuasive pro AI arguments.
Its developed is own solar powered nanobot hardware.
Turning the AI works a fraction of a second after the AI is turned on. But this is useless, no one would turn an AI on and then immediately turn it off again. The person turning an unaligned AI on is likely mistaken in some way about what their AI will do. The AI will make sure not to correct that flawed conception until its too late.
I want to add that the AI probably does not know it is misaligned for a while.
I don’t know how to simply communicate to you the concept of things beyond the human level. Helpfully, Eliezer wrote a short story to communicate it, and I’ll link to that.
What do you think would happen if we built an intelligence of that relative speed-up to humanity, and connected it to the internet?
One argument I removed to make it shorter was approximately: “It doesn’t have to take over the world to cause you harm”. And since early misaligned AI is more likely to appear in a developed country, your odds of being harmed by it is higher compared to someone in an undeveloped country. If ISIS suddenly found itself 500 strong in Silicon Valley and in control of Google’s servers, surely you would have the right to be concerned before they had a good chance of taking over the whole world. And you’d be doubly worried if you did not understand how it went from 0 to 500 “strong”, or what the next increase in strength might be. You understand how nation states and terrorist organizations grow. I don’t think anyone currently understands, well, how AI grows in intelligence.
There were a million other arguments I wanted to “head off” in this post, but the whole point of introductory material is to be short.
> there is no reason to believe that rouge AI will be dramatically more powerful than corporations or terrorists”
I don’t think that’s true. If our AI ends up no more powerful than existing corporations or terrorists, why are we spending billions on it? It had better be more powerful than something. I agree alignment might not be “solvable” for the reasons you mention, and I don’t claim that it is.
I am specifically claiming AI will be unusually dangerous, though.