In most such scenarios, the AI doesn’t have a terminal goal of getting rid of us, but rather have it as a subgoal that arises from some larger terminal goal. The idea of a “paperclip maximizer” is one example- where a hypothetical AI is programmed to maximize the number of paperclips and then proceeds to try to do so throughout its future light cone.
If there is an AI that is interacting with humans, it may develop a theory of mind simply due to that. If one is interacting with entities that are a major part of your input, trying to predict and model their behavior is a straightforward thing to do. The more compelling argument in this sort of context would seem to me to be not that an AI won’t try to do so, but just that humans are so complicated that a decent theory of mind will be extremely difficult. (For example, when one tries to give lists of behavior and norms for austic individuals one never manages to get a complete list, and some of the more subtle ones, like sarcasm are essentially impossible to convey in any reasonable fashion).
I don’t also know how unlikely such paths are. A 1% or even a 2% chance of existential risk would be pretty high compared to other sources of existential risk.
In most such scenarios, the AI doesn’t have a terminal goal of getting rid of us, but rather have it as a subgoal that arises from some larger terminal goal.
Because that’s like winning the lottery. Of all the possible things it can do with the atoms that comprise you, few would involve keeping you alive, let alone living a life worth living.
In most such scenarios, the AI doesn’t have a terminal goal of getting rid of us, but rather have it as a subgoal that arises from some larger terminal goal. The idea of a “paperclip maximizer” is one example- where a hypothetical AI is programmed to maximize the number of paperclips and then proceeds to try to do so throughout its future light cone.
If there is an AI that is interacting with humans, it may develop a theory of mind simply due to that. If one is interacting with entities that are a major part of your input, trying to predict and model their behavior is a straightforward thing to do. The more compelling argument in this sort of context would seem to me to be not that an AI won’t try to do so, but just that humans are so complicated that a decent theory of mind will be extremely difficult. (For example, when one tries to give lists of behavior and norms for austic individuals one never manages to get a complete list, and some of the more subtle ones, like sarcasm are essentially impossible to convey in any reasonable fashion).
I don’t also know how unlikely such paths are. A 1% or even a 2% chance of existential risk would be pretty high compared to other sources of existential risk.
So why not the opposite, why wouldn’t it have human intentions as a subgoal?
Because that’s like winning the lottery. Of all the possible things it can do with the atoms that comprise you, few would involve keeping you alive, let alone living a life worth living.