I recently realized that I’ve been confused about an extremely basic concept: the difference between an Oracle and an autonomous agent.
This feels obvious in some sense. But actually, you can ‘get’ to any AI system via output behavior + robotics. If you can answer arbitrary questions, you can also answer the question ‘what’s the next move in this MDP’, or less abstractly, ‘what’s the next steering action of the imaginary wheel’ (for a self-driving car). And the difference can’t be ‘an autonomous agent has a robotic component’.
The essential difference seems to be that the former system only uses its output channels whenever it is probed, whereas the second uses them autonomously. But I don’t ever hear people make this distinction. I think part of the reason why I hadn’t internalized this as an axis before is that there is the agent vs. nonagent thing, but actually, those are orthogonal to each other. We clearly can have any of the four combinations of {agent, nonagent} × {autonomous, non-autonomous}.[1]
It’s a pretty bad sign that I don’t know without looking at the definition whether ‘tool AI’ refers to the entire bottom half or just the bottom-left quadrant. With looking, it seems to be just the latter.
What led me to this was thinking about Corrigibility. I think it is applicable to the entire top half, all agent-like systems, but it feels like a stronger requirement for the top right, autonomous agents. If you have an oracle, then corrigibility seems to reduce to ‘don’t try to influence user’s behavior through your answers’.
When I look at this, I am convinced by the arguments that we probably can’t just build Tool AI, but I super want the most powerful systems of the future be non-autonomous. That just seems to be way safer without sacrificing a lot of performance. I think because of this, I’ve been thinking of IDA as trying to build non-autonomous systems (basically oracles), even though the sequence pretty clearly seems to have autonomous systems in mind.[2] On the other hand, Debate seems to be primarily aimed at non-autonomous systems, which (if true) is an interesting difference.
So is all of this just news to me, and actually everyone is aware of this distinction?
Two existing suggestions for how to avoid existential risk naturally fall out of this framing.
Go all the way to the left (even further than the picture implies) by giving the AI no output channels whatsoever. This is Microscope AI.
Go all the way to the bottom and avoid all agent-like systems, but allow autonomous systems like self-driving cars. This is (as I understand it) Comprehensive AI Services (CAIS).
I recently realized that I’ve been confused about an extremely basic concept: the difference between an Oracle and an autonomous agent.
This feels obvious in some sense. But actually, you can ‘get’ to any AI system via output behavior + robotics. If you can answer arbitrary questions, you can also answer the question ‘what’s the next move in this MDP’, or less abstractly, ‘what’s the next steering action of the imaginary wheel’ (for a self-driving car). And the difference can’t be ‘an autonomous agent has a robotic component’.
The essential difference seems to be that the former system only uses its output channels whenever it is probed, whereas the second uses them autonomously. But I don’t ever hear people make this distinction. I think part of the reason why I hadn’t internalized this as an axis before is that there is the agent vs. nonagent thing, but actually, those are orthogonal to each other. We clearly can have any of the four combinations of {agent, nonagent} × {autonomous, non-autonomous}.[1]
It’s a pretty bad sign that I don’t know without looking at the definition whether ‘tool AI’ refers to the entire bottom half or just the bottom-left quadrant. With looking, it seems to be just the latter.
What led me to this was thinking about Corrigibility. I think it is applicable to the entire top half, all agent-like systems, but it feels like a stronger requirement for the top right, autonomous agents. If you have an oracle, then corrigibility seems to reduce to ‘don’t try to influence user’s behavior through your answers’.
When I look at this, I am convinced by the arguments that we probably can’t just build Tool AI, but I super want the most powerful systems of the future be non-autonomous. That just seems to be way safer without sacrificing a lot of performance. I think because of this, I’ve been thinking of IDA as trying to build non-autonomous systems (basically oracles), even though the sequence pretty clearly seems to have autonomous systems in mind.[2] On the other hand, Debate seems to be primarily aimed at non-autonomous systems, which (if true) is an interesting difference.
So is all of this just news to me, and actually everyone is aware of this distinction?
And if you added a third axis for ‘robotic/non-robotic’, we would end up with examples in all eight areas.
I award myself an F- for doing this.
Two existing suggestions for how to avoid existential risk naturally fall out of this framing.
Go all the way to the left (even further than the picture implies) by giving the AI no output channels whatsoever. This is Microscope AI.
Go all the way to the bottom and avoid all agent-like systems, but allow autonomous systems like self-driving cars. This is (as I understand it) Comprehensive AI Services (CAIS).