Joar Skalse comments on Towards Guaranteed Safe AI: A Framework for Ensuring Robust and Reliable AI Systems

Joar Skalse 24 May 2024 15:37 UTC
1 point
0
I’m not so convinved of this. Yes, for some complex safety properties, the world model will probably have to be very smart. However, this does not mean that you have to model everything—depending on your safety specification and use case, you may be able to factor out a huge amount of complexity. We know from existing cases that this is true on a small scale—why should it not also be true on a medium or large scale?
For example, with a detailed model of the human body, you may be able to prove whether or not a given chemical could be harmful to ingest. This cannot be done with current tools, because we don’t have a detailed computational model of the human body (and even if we did, we would not be able to use it for scaleable inference). However, this seems like the kind of thing that could plausibly be created in the not-so-long term using AI tools. And if we had such a model, we could prove many interesting safety properties for e.g. pharmacutical development AIs (even if these AIs know many things that are not covered by this world model).
Suppose you had a world model which was as smart as GPT-3 (but magically interpretable). Do you think this would be useful for something?
I think that would be extremely useful, because it would tell us many things about how to implement cognitive algorithms. But I don’t think it would be very useful for proving safety properties (which I assume was your actual question). GPT-3′s capabilities are wide but shallow, but in most cases, what we would need are capabilities that are narrow but deep.