Part of the trouble with this is that we don’t really know what kind of demonstrations would be within the power of a superintelligent AI. If the coin comes up tails, do you get to say “I’ve got a rigorous proof of my friendliness which I can show you” on the presumption that you can mindhack the reader into thinking they’ve seen a rigorous proof? Do you get to say it if the coin came up tails on the presumption that a superintelligent AI could come up with a proof that a human could actually verify? Declare it off bounds because you can’t come up with such a proof and don’t think a human would be able to check one that an AI came up with anyway?
Part of the trouble with this is that we don’t really know what kind of demonstrations would be within the power of a superintelligent AI. If the coin comes up tails, do you get to say “I’ve got a rigorous proof of my friendliness which I can show you” on the presumption that you can mindhack the reader into thinking they’ve seen a rigorous proof? Do you get to say it if the coin came up tails on the presumption that a superintelligent AI could come up with a proof that a human could actually verify? Declare it off bounds because you can’t come up with such a proof and don’t think a human would be able to check one that an AI came up with anyway?