I don’t believe in a clear distinction between interactions that are attacks and interactions that are not attacks. When a politician asks people to vote for him so that he can have power, he’s not engaging in “attacks” but if he wins he still gets power. Some of the things a politician says are more manipulative than others.
I think that AGI’s are more likely able to justify to themselves by engaging in power-seeking behavior that’s about simply being very persuasive then to justify misleading people by faking voices, so I’m more worried about them wielding power in ways that are easier to rationalize.
I don’t believe in a clear distinction between interactions that are attacks and interactions that are not attacks. When a politician asks people to vote for him so that he can have power, he’s not engaging in “attacks” but if he wins he still gets power. Some of the things a politician says are more manipulative than others.
I think that AGI’s are more likely able to justify to themselves by engaging in power-seeking behavior that’s about simply being very persuasive then to justify misleading people by faking voices, so I’m more worried about them wielding power in ways that are easier to rationalize.