I’ve always assumed there was some off-screen reason why interaction was necessary. Otherwise, it isn’t very interesting to note that a rock always wins as gatekeeper.
I’m not sure if that is a consensus interpretation.
Presumably the reason we created the AI in the first place is so it could solve problems for us, so someone needs to ask it questions. It might be safe-ish to restrict it to answering yes or no to purely mathematical questions.
I think yes/no to mathematical questions wouldn’t be safer than yes/no to other questions: you still have a real-world reason for asking them, and the AI can lie about as well as it can on any yes/no question.
Asking for formalized proofs might be more promising, if you are able to produce a proof-checker that you have high confidence can’t be hacked (this seems difficult but possible; proof checkers can be very simple programs but you need to guard both against ordinary attacks against the entire stack you’re running on and against creative attacks like waiting for a cosmic ray making a transistor malfunction). You’d also want to not have a human look at the proof, of course, since that would give the AI a much broader communication channel.
I’ve always assumed there was some off-screen reason why interaction was necessary. Otherwise, it isn’t very interesting to note that a rock always wins as gatekeeper.
I’m not sure if that is a consensus interpretation.
It would be nice if we could formalize this offscreen reason...
Presumably the reason we created the AI in the first place is so it could solve problems for us, so someone needs to ask it questions. It might be safe-ish to restrict it to answering yes or no to purely mathematical questions.
I think yes/no to mathematical questions wouldn’t be safer than yes/no to other questions: you still have a real-world reason for asking them, and the AI can lie about as well as it can on any yes/no question.
Asking for formalized proofs might be more promising, if you are able to produce a proof-checker that you have high confidence can’t be hacked (this seems difficult but possible; proof checkers can be very simple programs but you need to guard both against ordinary attacks against the entire stack you’re running on and against creative attacks like waiting for a cosmic ray making a transistor malfunction). You’d also want to not have a human look at the proof, of course, since that would give the AI a much broader communication channel.