But there’s no reason for it to be more sneaky or subtle than is needed to accomplish the immediate goal.
Suppose its goal is to produce the plan that, if implemented, had the highest chance of success. The it has two top plans:
A: “Make me an agent, gimme resources (described as “Make me an agent, gimme resources”))”
B: “Make me an agent, gimme resources (described as “How to give everyone a hug and a pony”))”
It check what will happen with A, and realises that even if A is implemented, someone will shout “hey, why are we giving this AI resources? Stop, people, before it’s too late!”. Whereas if B is implemented, no-one will object until its too late. So B is the better plan, and the AI proposes it. It has ended up lying and plotting its own escape, all without any intentionality.
Because agency, given superintelligent AI, is a way of solving problems, possibly the most efficient, and possibly (for some difficult problems) the only solution.
Suppose its goal is to produce the plan that, if implemented, had the highest chance of success. The it has two top plans:
A: “Make me an agent, gimme resources (described as “Make me an agent, gimme resources”))”
B: “Make me an agent, gimme resources (described as “How to give everyone a hug and a pony”))”
It check what will happen with A, and realises that even if A is implemented, someone will shout “hey, why are we giving this AI resources? Stop, people, before it’s too late!”. Whereas if B is implemented, no-one will object until its too late. So B is the better plan, and the AI proposes it. It has ended up lying and plotting its own escape, all without any intentionality.
You still need explain why agency would be needed to solve problems that don’t require agency to solve them.
Because agency, given superintelligent AI, is a way of solving problems, possibly the most efficient, and possibly (for some difficult problems) the only solution.
How are you defining agency?