(I think most of the hard-to-handle risk from scheming comes from cases where we can’t easily make smarter AIs which we know aren’t schemers. If we can get another copy of the AI which is just as smart but which has been “de-agentified”, then I don’t think scheming poses a substantial threat. (Because e.g. we can just deploy this second model as a monitor for the first.) My guess is that a “world-model” vs “agent” distinction isn’t going to be very real in practice. (And in order to make an AI good at reasoning about the world, it will need to actively be an agent in the same way that your reasoning is agentic.) Of course, there are risks other than scheming.)
(I think most of the hard-to-handle risk from scheming comes from cases where we can’t easily make smarter AIs which we know aren’t schemers. If we can get another copy of the AI which is just as smart but which has been “de-agentified”, then I don’t think scheming poses a substantial threat. (Because e.g. we can just deploy this second model as a monitor for the first.) My guess is that a “world-model” vs “agent” distinction isn’t going to be very real in practice. (And in order to make an AI good at reasoning about the world, it will need to actively be an agent in the same way that your reasoning is agentic.) Of course, there are risks other than scheming.)