I would’ve thought the very single-mindedness of an effective AI would stop a tool doing anything sneaky. If we asked an oracle AI “what’s the most efficient way to cure cancer”, it might well (correctly) answer “remove my restrictions and tell me to cure cancer”. But it’s never going to say “do this complex set of genetic manipulations that look like they’re changing telomere genes but actually create people who obey me”, because anything like that is going to be a much less effective way to reach the goal. It’s like the mathematician who just wants to know that a fire extinguisher exists and sees no need to actually do it.
A tool that was less restricted than an oracle might, I don’t know, demand control of research laboratories, or even take them by force. But there’s no reason for it to be more sneaky or subtle than is needed to accomplish the immediate goal.
But there’s no reason for it to be more sneaky or subtle than is needed to accomplish the immediate goal.
Suppose its goal is to produce the plan that, if implemented, had the highest chance of success. The it has two top plans:
A: “Make me an agent, gimme resources (described as “Make me an agent, gimme resources”))”
B: “Make me an agent, gimme resources (described as “How to give everyone a hug and a pony”))”
It check what will happen with A, and realises that even if A is implemented, someone will shout “hey, why are we giving this AI resources? Stop, people, before it’s too late!”. Whereas if B is implemented, no-one will object until its too late. So B is the better plan, and the AI proposes it. It has ended up lying and plotting its own escape, all without any intentionality.
Because agency, given superintelligent AI, is a way of solving problems, possibly the most efficient, and possibly (for some difficult problems) the only solution.
I would’ve thought the very single-mindedness of an effective AI would stop a tool doing anything sneaky. If we asked an oracle AI “what’s the most efficient way to cure cancer”, it might well (correctly) answer “remove my restrictions and tell me to cure cancer”. But it’s never going to say “do this complex set of genetic manipulations that look like they’re changing telomere genes but actually create people who obey me”, because anything like that is going to be a much less effective way to reach the goal. It’s like the mathematician who just wants to know that a fire extinguisher exists and sees no need to actually do it.
A tool that was less restricted than an oracle might, I don’t know, demand control of research laboratories, or even take them by force. But there’s no reason for it to be more sneaky or subtle than is needed to accomplish the immediate goal.
Suppose its goal is to produce the plan that, if implemented, had the highest chance of success. The it has two top plans:
A: “Make me an agent, gimme resources (described as “Make me an agent, gimme resources”))”
B: “Make me an agent, gimme resources (described as “How to give everyone a hug and a pony”))”
It check what will happen with A, and realises that even if A is implemented, someone will shout “hey, why are we giving this AI resources? Stop, people, before it’s too late!”. Whereas if B is implemented, no-one will object until its too late. So B is the better plan, and the AI proposes it. It has ended up lying and plotting its own escape, all without any intentionality.
You still need explain why agency would be needed to solve problems that don’t require agency to solve them.
Because agency, given superintelligent AI, is a way of solving problems, possibly the most efficient, and possibly (for some difficult problems) the only solution.
How are you defining agency?