Paul Christiano has explored the framing of interactive proofs before, see for example this or this.
I think this is a exciting framing for AI safety, since it gets to the crux of one of the issues as you point out in your question.
Paul Christiano has explored the framing of interactive proofs before, see for example this or this.
I think this is a exciting framing for AI safety, since it gets to the crux of one of the issues as you point out in your question.