We can’t really make concrete statement bout how these scenarios will work.
Why not? From where I’m sitting it sure seems like we can. We have all sorts of tools for analyzing the behavior of computer programs, which include AIs. And we have a longer history of analyzing engineering blueprints. We have information theory which triggers big red warning signs when a solution seems more complex than it needs to be (which any nefarious solution would be). We have cryptographic tools for demanding information from even the most powerful adversaries in ways that simply cannot be cheated.
So, saying we can never trust the output of a superhuman AI “because, superhuman!” seems naïve and ignorant at the very least.
We have cryptographic tools for demanding information from even the most powerful adversaries in ways that simply cannot be cheated.
It’s worth noting that for the most part, we don’t. Aside from highly limited techniques such as one-time pads, we merely have cryptographic tools for demanding information from adversaries with bounded computational power in ways that simply cannot be cheated as long as we assume one of several hardness conjectures.
“with bounded computational power”—if that limited computational power means that even if every atom in the known Universe was a computer, it would still take more than the age of the Universe to brute-force it… then it is safe to assume that even the most superintelligent AI couldn’t break it.
Why not? From where I’m sitting it sure seems like we can. We have all sorts of tools for analyzing the behavior of computer programs, which include AIs. And we have a longer history of analyzing engineering blueprints. We have information theory which triggers big red warning signs when a solution seems more complex than it needs to be (which any nefarious solution would be). We have cryptographic tools for demanding information from even the most powerful adversaries in ways that simply cannot be cheated.
So, saying we can never trust the output of a superhuman AI “because, superhuman!” seems naïve and ignorant at the very least.
It’s worth noting that for the most part, we don’t. Aside from highly limited techniques such as one-time pads, we merely have cryptographic tools for demanding information from adversaries with bounded computational power in ways that simply cannot be cheated as long as we assume one of several hardness conjectures.
“with bounded computational power”—if that limited computational power means that even if every atom in the known Universe was a computer, it would still take more than the age of the Universe to brute-force it… then it is safe to assume that even the most superintelligent AI couldn’t break it.
I think we’re saying the same thing? With the added correct clarification “as long as we assume one of several hardness conjectures.”
I work in cryptography, I’m aware of its limitations. But this application is within the scope of things that are currently being worked on...