ryan_greenblatt comments on Towards Guaranteed Safe AI: A Framework for Ensuring Robust and Reliable AI Systems

ryan_greenblatt 24 May 2024 18:54 UTC
2 points
0
(I think most of the hard-to-handle risk from scheming comes from cases where we can’t easily make smarter AIs which we know aren’t schemers. If we can get another copy of the AI which is just as smart but which has been “de-agentified”, then I don’t think scheming poses a substantial threat. (Because e.g. we can just deploy this second model as a monitor for the first.) My guess is that a “world-model” vs “agent” distinction isn’t going to be very real in practice. (And in order to make an AI good at reasoning about the world, it will need to actively be an agent in the same way that your reasoning is agentic.) Of course, there are risks other than scheming.)