Intrinsic Power-Seeking: AI Might Seek Power for Power’s Sake

Link post

I think AI agents (trained end-to-end) might intrinsically prefer power-seeking, in addition to whatever instrumental drives they gain.

The logical structure of the argument

Premises

  1. People will configure AI systems to be autonomous and reliable in order to accomplish tasks.

  2. This configuration process will reinforce & generalize behaviors which complete tasks reliably.

  3. Many tasks involve power-seeking.

  4. The AI will complete these tasks by seeking power.

  5. The AI will be repeatedly reinforced for its historical actions which seek power.

  6. There is a decent chance the reinforced circuits (“subshards”) prioritize gaining power for the AI’s own sake, not just for the user’s benefit.

Conclusion: There is a decent chance the AI seeks power for itself, when possible.

Read the full post at turntrout.com/​​intrinsic-power-seeking

Find out when I post more content: newsletter & RSS

Note that I don’t generally read or reply to comments on LessWrong. To contact me, email alex@turntrout.com.