Link post
I think AI agents (trained end-to-end) might intrinsically prefer power-seeking, in addition to whatever instrumental drives they gain.
The logical structure of the argumentPremisesPeople will configure AI systems to be autonomous and reliable in order to accomplish tasks.This configuration process will reinforce & generalize behaviors which complete tasks reliably.Many tasks involve power-seeking.The AI will complete these tasks by seeking power.The AI will be repeatedly reinforced for its historical actions which seek power.There is a decent chance the reinforced circuits (“subshards”) prioritize gaining power for the AI’s own sake, not just for the user’s benefit.Conclusion: There is a decent chance the AI seeks power for itself, when possible.
The logical structure of the argument
Premises
People will configure AI systems to be autonomous and reliable in order to accomplish tasks.
This configuration process will reinforce & generalize behaviors which complete tasks reliably.
Many tasks involve power-seeking.
The AI will complete these tasks by seeking power.
The AI will be repeatedly reinforced for its historical actions which seek power.
There is a decent chance the reinforced circuits (“subshards”) prioritize gaining power for the AI’s own sake, not just for the user’s benefit.
Conclusion: There is a decent chance the AI seeks power for itself, when possible.
Read the full post at turntrout.com/intrinsic-power-seeking
turntrout.com/intrinsic-power-seeking
Find out when I post more content: newsletter & RSS
Note that I don’t generally read or reply to comments on LessWrong. To contact me, email alex@turntrout.com.
alex@turntrout.com
Intrinsic Power-Seeking: AI Might Seek Power for Power’s Sake
Link post
I think AI agents (trained end-to-end) might intrinsically prefer power-seeking, in addition to whatever instrumental drives they gain.
Read the full post at
turntrout.com/intrinsic-power-seeking
Find out when I post more content: newsletter & RSS
Note that I don’t generally read or reply to comments on LessWrong. To contact me, email
alex@turntrout.com
.