That seems to imply that if the desired behavior is more complex, it is more likely to result in cooperation with you. Why?
Also, such a default strategy would be wide open to exploitation: just make a program that (when played against yours) looks very complex, and you can’t prove anything about it, but that’s all just obfuscation and in the end it reliably defects.
What are the odds that a system that reliably defects will not have a proof that it defects, but that a system that cooperates if it is cooperated with will have no proof.
It’s less a matter of having a proof (or not having one), and more a matter of having a sufficiently short proof (or simple by other criteria) that your adversary will find it before they give up.
In general, I think any algorithm with behavior X and a proof has another algorithm with identical behavior and no proof of it. But here we’re interested in algorithms that are chosen (by their makers) to be deliberately easy to prove things about.
What are the odds that a system that reliably defects will not have a proof that it defects,
It won’t want others to prove it will defect. So if others cooperate when unable to prove either way, then it can always construct itself to make it unproveable that it defects.
that a system that cooperates if it is cooperated with will have no proof.
It will want to have a proof. Here the question is, how good the adversary is at finding proofs in practice. Which is what I asked elsewhere, and I would really like to have an answer.
However, perhaps the system knows its own proof (or its designer knew it). Then it can just write out that proof at the start of its source code where the adversary is sure to see it.
See the agents called “WaitDefectBot” and “WaitFairBot” in the section on optimality in the paper. For any modal agent, there’s a pair that are undecidable in high enough formal systems to be indistinguishable, such that the right move would be cooperation against one and defection against the other. And you can make them both arbitrarily complex if you like.
So no, it would not be a good idea in general to cooperate with agents that are undecidable to you, and it would only incentivize agents to be undecidable toward you (and then defect).
It depends on how they use their intelligence. For instance, consider the variant of FairBot that appears in the last Masquerade post, which oscillates between seeking proofs of its opponent’s cooperation and defection, up to a high threshold.
The original FairBot, or PrudentBot, or Masquerade, all reach mutual cooperation with this variant (call it ToughButFairBot), despite its undecidability in general up to PA+N. That’s because if it finds a proof at a lower level, it acts on that immediately, so you can figure out what it does in those particular cases without climbing too high up the tower of formal systems.
The upshot is that you can have higher levels of strategy when necessary, without sacrificing your ability to provably act on lower levels in other cases. So this principle doesn’t cash out as “defect against anyone smarter than you”, but rather as “defect against anyone who refuses to let you figure out how they’re going to respond to you in particular”.
That seems to imply that if the desired behavior is more complex, it is more likely to result in cooperation with you. Why?
Also, such a default strategy would be wide open to exploitation: just make a program that (when played against yours) looks very complex, and you can’t prove anything about it, but that’s all just obfuscation and in the end it reliably defects.
What are the odds that a system that reliably defects will not have a proof that it defects, but that a system that cooperates if it is cooperated with will have no proof.
It’s less a matter of having a proof (or not having one), and more a matter of having a sufficiently short proof (or simple by other criteria) that your adversary will find it before they give up.
In general, I think any algorithm with behavior X and a proof has another algorithm with identical behavior and no proof of it. But here we’re interested in algorithms that are chosen (by their makers) to be deliberately easy to prove things about.
It won’t want others to prove it will defect. So if others cooperate when unable to prove either way, then it can always construct itself to make it unproveable that it defects.
It will want to have a proof. Here the question is, how good the adversary is at finding proofs in practice. Which is what I asked elsewhere, and I would really like to have an answer.
However, perhaps the system knows its own proof (or its designer knew it). Then it can just write out that proof at the start of its source code where the adversary is sure to see it.
See the agents called “WaitDefectBot” and “WaitFairBot” in the section on optimality in the paper. For any modal agent, there’s a pair that are undecidable in high enough formal systems to be indistinguishable, such that the right move would be cooperation against one and defection against the other. And you can make them both arbitrarily complex if you like.
So no, it would not be a good idea in general to cooperate with agents that are undecidable to you, and it would only incentivize agents to be undecidable toward you (and then defect).
This does have the problem that in practice it cashes out as defect against anyone who is sufficiently smarter than you.
It depends on how they use their intelligence. For instance, consider the variant of FairBot that appears in the last Masquerade post, which oscillates between seeking proofs of its opponent’s cooperation and defection, up to a high threshold.
The original FairBot, or PrudentBot, or Masquerade, all reach mutual cooperation with this variant (call it ToughButFairBot), despite its undecidability in general up to PA+N. That’s because if it finds a proof at a lower level, it acts on that immediately, so you can figure out what it does in those particular cases without climbing too high up the tower of formal systems.
The upshot is that you can have higher levels of strategy when necessary, without sacrificing your ability to provably act on lower levels in other cases. So this principle doesn’t cash out as “defect against anyone smarter than you”, but rather as “defect against anyone who refuses to let you figure out how they’re going to respond to you in particular”.