I don’t see how I could ever know—or even be significantly confident—that a given pair of arbiters X and Y were such that the probability that an AI can exploit arbiter X doesn’t correlate well with the probability that the AI can exploit arbiter Y.
That is, I don’t see how I could ever, even in principle, be significantly confident that their algorithms were “sufficiently different” for the probabilities to multiply.
The same thing goes for knowing that whatever system is ensuring that X and Y receive the same payload hasn’t been corrupted. (ETA: That part has been clarified elsethread; thanks!)
Now, maybe what I know doesn’t matter: maybe we’re not talking about provable security. If you’re saying that we could build networks of arbiters and linkages between those arbiters in a secure box such that there was an unknowable but nonzero chance that a hostile superintelligence in that box had failed to corrupt those arbiters, then I agree with you, at least in principle.
The same thing goes for knowing that whatever system is ensuring that X and Y receive the same payload hasn’t been corrupted.
Actually, that part’s easy. The message can only corrupt things which both receive it as input and perform nontrivial computation on it. You don’t need an “arbiter” to copy a message, nor do you need to decrpt it.
I don’t see how I could ever know—or even be significantly confident—that a given pair of arbiters X and Y were such that the probability that an AI can exploit arbiter X doesn’t correlate well with the probability that the AI can exploit arbiter Y.
That is, I don’t see how I could ever, even in principle, be significantly confident that their algorithms were “sufficiently different” for the probabilities to multiply.
The same thing goes for knowing that whatever system is ensuring that X and Y receive the same payload hasn’t been corrupted. (ETA: That part has been clarified elsethread; thanks!)
Now, maybe what I know doesn’t matter: maybe we’re not talking about provable security. If you’re saying that we could build networks of arbiters and linkages between those arbiters in a secure box such that there was an unknowable but nonzero chance that a hostile superintelligence in that box had failed to corrupt those arbiters, then I agree with you, at least in principle.
Actually, that part’s easy. The message can only corrupt things which both receive it as input and perform nontrivial computation on it. You don’t need an “arbiter” to copy a message, nor do you need to decrpt it.