Is it possible to obtain a dataset of mathematical proofs with Oracle’s hidden messages without running a dangerous Oracle?
We can ask AI researcher to embed some hidden messages into several proofs and use it as a training dataset. But it will help to detect only those hidden messages from Oracle which are similar to what AI researcher wrote.
Gurkenglas already suggested that we have to restrict the Oracles to output only shortest proof. If the proof is shortest, doesn’t it imply the absence of hidden messages?
Regarding 5):
Is it possible to obtain a dataset of mathematical proofs with Oracle’s hidden messages without running a dangerous Oracle?
We can ask AI researcher to embed some hidden messages into several proofs and use it as a training dataset. But it will help to detect only those hidden messages from Oracle which are similar to what AI researcher wrote.
Gurkenglas already suggested that we have to restrict the Oracles to output only shortest proof. If the proof is shortest, doesn’t it imply the absence of hidden messages?
How we could be sure that the proof is actually the shortest?
Tell it to prove: That proof is shortest and this proof is shortest.
Another option is to run several different Oracles of phi and take the shortest proof
Mr Armstrong has specified that (ctrl-f) “other Oracles can’t generally be used on the same problem”.
“Can’t generally be used”; if you understand the setup and are careful, you might be able to do so.