On the other hand, maybe this could still be dangerous, if P and P’ have shared instrumental goals with regards to your predictions for B?
Assuming that P and P’ are perfectly antialigned, they won’t cooperate. However they need to be really antialigned for this to work. If there is some obscure borderline that P thinks is a paperclip, and P’ thinks isn’t, they can work together to tile the universe with it.
I don’t think it would bed that easy to change evolution into a reproductive fitness minimiser, or to negate a humans values.
If P and P’ are antialigned, then in the scenario where you only listen to them if they agree, then for any particular prediction, at least one of them will consider disagreeing better than that. The game theory is a little complicated, but they aren’t being incentivised to report their predictions.
Actually, A has to be able to manage, not only correct and competent adversaries, but deluded and half mad ones too.
I think P would find it hard to be inscrutable. It is impossible to obfuscate arbitrary code.
I agree with your final point. Though for any particular string X, the fastest turing machine to produce it is the one that is basically print(X) . This is why we use short TM’s not just fast ones.
Thanks for a thoughtful comment.
Assuming that P and P’ are perfectly antialigned, they won’t cooperate. However they need to be really antialigned for this to work. If there is some obscure borderline that P thinks is a paperclip, and P’ thinks isn’t, they can work together to tile the universe with it.
I don’t think it would bed that easy to change evolution into a reproductive fitness minimiser, or to negate a humans values.
If P and P’ are antialigned, then in the scenario where you only listen to them if they agree, then for any particular prediction, at least one of them will consider disagreeing better than that. The game theory is a little complicated, but they aren’t being incentivised to report their predictions.
Actually, A has to be able to manage, not only correct and competent adversaries, but deluded and half mad ones too.
I think P would find it hard to be inscrutable. It is impossible to obfuscate arbitrary code.
I agree with your final point. Though for any particular string X, the fastest turing machine to produce it is the one that is basically print(X) . This is why we use short TM’s not just fast ones.