The Vox article also mistakes the source of influence-seeking patterns to be about social influence rather than ’systems that try to increase in power and numbers tend to do so, so are selected for if we accidentally or intentionally produce them and don’t effectively weed them out; this is why living things are adapted to survive and expand; such desires motivate conflict with humans when power and reproduction can be obtained by conflict with humans, which can look like robot armies taking control.
Yes, I agree the Vox article made this mistake. Me saying “influence” probably gives people the wrong idea so I should change that—I’m including “controls the military” as a central example, but it’s not what comes to mind when you hear “influence.” I like “influence” more than “power” because it’s more specific, captures what we actually care about, and less likely to lead to a debate about “what is power anyway.”
In general I think the Vox article’s discussion of Part II has some problems, and the discussion of Part I is closer to the mark. (Part I is also more in line with the narrative of the article, since Part II really is more like Terminator. I’m not sure which way the causality goes here though, i.e. whether they ended up with that narrative based on misunderstandings about Part II or whether they framed Part II in a way that made it more consistent with the narrative, maybe having been inspired to write the piece based on Part I.)
There is a different mistake with the same flavor, later in the Vox article: “But eventually, the algorithms’ incentives to expand influence might start to overtake their incentives to achieve the specified goal. That, in turn, makes the AI system worse at achieving its intended goal, which increases the odds of some terrible failure”
The problem isn’t really “the AI system is worse at achieving its intended goal;” like you say, it’s that influence-seeking AI systems will eventually be in conflict with humans, and that’s bad news if AI systems are much more capable/powerful than we are.
[AI systems] wind up controlling or creating that military power and expropriating humanity (which couldn’t fight back thereafter even if unified)
Failure would presumably occur before we get to the stage of “robot army can defeat unified humanity”—failure should happen soon after it becomes possible, and there are easier ways to fail than to win a clean war. Emphasizing this may give people the wrong idea, since it makes unity and stability seem like a solution rather than a stopgap. But emphasizing the robot army seems to have a similar problem—it doesn’t really matter whether there is a literal robot army, you are in trouble anyway.
Failure would presumably occur before we get to the stage of “robot army can defeat unified humanity”—failure should happen soon after it becomes possible, and there are easier ways to fail than to win a clean war. Emphasizing this may give people the wrong idea, since it makes unity and stability seem like a solution rather than a stopgap. But emphasizing the robot army seems to have a similar problem—it doesn’t really matter whether there is a literal robot army, you are in trouble anyway.
I agree other powerful tools can achieve the same outcome, and since in practice humanity isn’t unified rogue AI could act earlier, but either way you get to AI controlling the means of coercive force, which helps people to understand the end-state reached.
It’s good to both understand the events by which one is shifted into the bad trajectory, and to be clear on what the trajectory is. It sounds like your focus on the former may have interfered with the latter.
I do agree there was a miscommunication about the end state, and that language like “lots of obvious destruction” is an understatement.
I do still endorse “military leaders might issue an order and find it is ignored” (or total collapse of society) as basically accurate and not an understatement.
Yes, I agree the Vox article made this mistake. Me saying “influence” probably gives people the wrong idea so I should change that—I’m including “controls the military” as a central example, but it’s not what comes to mind when you hear “influence.” I like “influence” more than “power” because it’s more specific, captures what we actually care about, and less likely to lead to a debate about “what is power anyway.”
In general I think the Vox article’s discussion of Part II has some problems, and the discussion of Part I is closer to the mark. (Part I is also more in line with the narrative of the article, since Part II really is more like Terminator. I’m not sure which way the causality goes here though, i.e. whether they ended up with that narrative based on misunderstandings about Part II or whether they framed Part II in a way that made it more consistent with the narrative, maybe having been inspired to write the piece based on Part I.)
There is a different mistake with the same flavor, later in the Vox article: “But eventually, the algorithms’ incentives to expand influence might start to overtake their incentives to achieve the specified goal. That, in turn, makes the AI system worse at achieving its intended goal, which increases the odds of some terrible failure”
The problem isn’t really “the AI system is worse at achieving its intended goal;” like you say, it’s that influence-seeking AI systems will eventually be in conflict with humans, and that’s bad news if AI systems are much more capable/powerful than we are.
Failure would presumably occur before we get to the stage of “robot army can defeat unified humanity”—failure should happen soon after it becomes possible, and there are easier ways to fail than to win a clean war. Emphasizing this may give people the wrong idea, since it makes unity and stability seem like a solution rather than a stopgap. But emphasizing the robot army seems to have a similar problem—it doesn’t really matter whether there is a literal robot army, you are in trouble anyway.
I agree other powerful tools can achieve the same outcome, and since in practice humanity isn’t unified rogue AI could act earlier, but either way you get to AI controlling the means of coercive force, which helps people to understand the end-state reached.
It’s good to both understand the events by which one is shifted into the bad trajectory, and to be clear on what the trajectory is. It sounds like your focus on the former may have interfered with the latter.
I do agree there was a miscommunication about the end state, and that language like “lots of obvious destruction” is an understatement.
I do still endorse “military leaders might issue an order and find it is ignored” (or total collapse of society) as basically accurate and not an understatement.