Thanks for the interesting post. I definitely agree that large scale manipulation of human behavior and belief by misaligned AI is a real danger.
One thing I think could use a bit more precision here is talking about “aligned AI”. I think it’s worth drawing a distinction between “seemingly aligned AI which seems safe at first, but actually warps human society/thought/behavior in negative ways” (almost aligned AI) and “truely aligned AI which actively protects humanity from warping influences of powerful tech.” (Truely aligned AI)
With this distinction I see you as making two separate claims. Let me know if you think I am describing your viewpoint correctly.
Claim 1: we might mistake almost-aligned-AI for truely-aligned-AI, and allow the almost-aligned-AI to make dangerous changes to human culture, behavior, and thought before realizing that anything has gone wrong, stealing our agency. If this process were allowed to continue too long, it could become very hard to stop/reverse because the almost-aligned-AI would be able to use its manipulation to protect itself.
Claim 2: truely-aligned-AI is very hard to get to, and also very hard to distinguish from almost-aligned-AI, so that we are quite likely to find ourselves in the scenario described in claim 1.
I’m not sure I world agree with claim 2 about the likelihood, but I definitely do agree with claim 1 about the danger.
I think it’s important to note that one way we could have an almost-aligned-AI is if what would have been a truely-aligned-AI was insufficiently resistant to being misused and an unethical human deliberately used it to manipulate society for accumulating power. There are currently malign manipulative agents at work in society, but the danger becomes much greater as AI tech grows in capability. I think this is likely to become somewhat problematic before we reach full-blown self-aware self-directing AGI that is truely aligned with humanity and thus protects us from being manipulated out of our agency.
I don’t have a clear sense of how much of a problem, and for how long.
Thanks Nathan. I understand that most people working on technical AI-safety research focus on this specific problem, namely of aligning AI—and less on misuse. I don’t expect a large ai-misuse audience here.
Your response—that “truly-aligned-AI” would not change human intent—was also suggested by other AI researchers. But this doesn’t address the problem: human intent is created from (and dependent on) societal structures. Perhaps I failed to make this clearer. But I was trying to suggest we lack an understanding of the genesis of human actions/intentions or goals—and thus cannot properly specify how human intent is constructed—and how to protect it from interference/manipulation. A world imbued with AI-techs will change the societal landscape significantly and potentially for the worse. I think that many view human “intention” as a property of humans that acts on the world and is somehow isolated or protected from the physical and cultural world (see Fig 1a). But the opposite is actually true: in humans intent and goals are likely caused significantly more by society than biology.
The optimist statement: The best way I can interpret “truly-aligned-AI won’t change human agency” is to say that “AI” will—help humans—solve the free will problem and will then “work with us” to redesign what human goals should be. But this later statement is a very tall-order (a United Nations statement that perhaps will never see the light of day...).
Thanks for the interesting post. I definitely agree that large scale manipulation of human behavior and belief by misaligned AI is a real danger. One thing I think could use a bit more precision here is talking about “aligned AI”. I think it’s worth drawing a distinction between “seemingly aligned AI which seems safe at first, but actually warps human society/thought/behavior in negative ways” (almost aligned AI) and “truely aligned AI which actively protects humanity from warping influences of powerful tech.” (Truely aligned AI) With this distinction I see you as making two separate claims. Let me know if you think I am describing your viewpoint correctly. Claim 1: we might mistake almost-aligned-AI for truely-aligned-AI, and allow the almost-aligned-AI to make dangerous changes to human culture, behavior, and thought before realizing that anything has gone wrong, stealing our agency. If this process were allowed to continue too long, it could become very hard to stop/reverse because the almost-aligned-AI would be able to use its manipulation to protect itself. Claim 2: truely-aligned-AI is very hard to get to, and also very hard to distinguish from almost-aligned-AI, so that we are quite likely to find ourselves in the scenario described in claim 1.
I’m not sure I world agree with claim 2 about the likelihood, but I definitely do agree with claim 1 about the danger.
I think it’s important to note that one way we could have an almost-aligned-AI is if what would have been a truely-aligned-AI was insufficiently resistant to being misused and an unethical human deliberately used it to manipulate society for accumulating power. There are currently malign manipulative agents at work in society, but the danger becomes much greater as AI tech grows in capability. I think this is likely to become somewhat problematic before we reach full-blown self-aware self-directing AGI that is truely aligned with humanity and thus protects us from being manipulated out of our agency. I don’t have a clear sense of how much of a problem, and for how long.
Thanks Nathan. I understand that most people working on technical AI-safety research focus on this specific problem, namely of aligning AI—and less on misuse. I don’t expect a large ai-misuse audience here.
Your response—that “truly-aligned-AI” would not change human intent—was also suggested by other AI researchers. But this doesn’t address the problem: human intent is created from (and dependent on) societal structures. Perhaps I failed to make this clearer. But I was trying to suggest we lack an understanding of the genesis of human actions/intentions or goals—and thus cannot properly specify how human intent is constructed—and how to protect it from interference/manipulation. A world imbued with AI-techs will change the societal landscape significantly and potentially for the worse. I think that many view human “intention” as a property of humans that acts on the world and is somehow isolated or protected from the physical and cultural world (see Fig 1a). But the opposite is actually true: in humans intent and goals are likely caused significantly more by society than biology.
The optimist statement: The best way I can interpret “truly-aligned-AI won’t change human agency” is to say that “AI” will—help humans—solve the free will problem and will then “work with us” to redesign what human goals should be. But this later statement is a very tall-order (a United Nations statement that perhaps will never see the light of day...).