What do you think of the arguments in this post that it’s possible to make safety cases that don’t rely on the model being unlikely to be a schemer?
I’m somewhat optimistic on this happening, conditional on considerable effort being invested.
As always, we will need more work on this agenda, and there will be more information about what control techniques work in practice, and which don’t.
What do you think of the arguments in this post that it’s possible to make safety cases that don’t rely on the model being unlikely to be a schemer?
I’m somewhat optimistic on this happening, conditional on considerable effort being invested.
As always, we will need more work on this agenda, and there will be more information about what control techniques work in practice, and which don’t.