I think I have come up with a modal agent that has these nice properties.
Cooperates against itself, and similar designs.
Cooperates against some fairbots (see notes later)
Defects against cooperate bot
Never gets the suckers payoff (C,D)
Only references the situation its actually in.
The design makes use of 2 different proof systems W and S . These are short for weak and strong. For example, you could consider W=PA and S=PA+1 .
Although W= intuitionistic Robinson arithmetic, and S=ZFC+GCH+IO would work too.
The algorithm
If W[C] then DefectElse If S[C] then CooperateElse Defect
Where C= My opponent cooperates against me.
The last point is key here. Let’s suppose this game is being played by actual computers made of physical atoms. The Justbot and Prudentbot in the paper linked above consider what the opponent would do against a player that isn’t yourself. For example, Prudentbot imagines a hypothetical world where its own code was deleted and replaced with the code for a defect bot. The hypothetical opponent may notice this apparent anomaly and decide investigating it is far more important than the prisoners dilemma.
My bot (any good names in the comments?) solves this by only considering its opponent against itself.
Consider two versions of this bot playing against each other. Not necessarily using the same W and S as each other.
Lets assume all 4 proof systems in use are consistent. We also want the criteria that none of the proof systems can ever prove a false statement. (Although I suspect this can be weakened somewhat.)
Neither bot ever gets the suckers payoff (C,D), as each bot will only cooperate if its S can prove that the other bot does too. If W[C] on either bot, then this would mean the other bot got the suckers payoff. This can’t happen. As neither W actually succeeds, if both of the S are sufficiently strong enough to prove this ( S1[¬W2[C1]] and S2[¬W1[C2]] ) (Note: some technical detail work on when exactly this is true, still I expect it to be true for most sensible choices of proof system) then the situation reduces to fairbot vs fairbot (Using proof systems S1,S2 ) This cooperates for the standard Lobian reasons. (With some technical assumptions about both systems being strong enough to do lob style reasoning. See bottom of page 8 of linked paper)
Fairbot can be considered as a special case of this bot design, where W is a proof system so weak it doesn’t prove anything at all. In this case, fairbot will cooperate if its using a strong enough proof system.
Cooperate bot can be proven to cooperate by W , so this bot defects.
A final note is that the same type of reasoning can be done where W and S represent proof searches in the same language (say PA) but with S having a much larger max proof length. (Long enough to prove that W doesn’t prove something by brute forcing over all proofs short enough.)
Open question? Can this work with logical inductors. Two logical inductor based agents that cooperate if their P(C) tends to one, but not too fast.
Yet More Modal Combat
Strongly recommended prior reading.
Robust Cooperation in the Prisoner’s Dilemma:
Program Equilibrium via Provability Logic
I think I have come up with a modal agent that has these nice properties.
Cooperates against itself, and similar designs.
Cooperates against some fairbots (see notes later)
Defects against cooperate bot
Never gets the suckers payoff (C,D)
Only references the situation its actually in.
The design makes use of 2 different proof systems W and S . These are short for weak and strong. For example, you could consider W=PA and S=PA+1 .
Although W= intuitionistic Robinson arithmetic, and S=ZFC+GCH+IO would work too.
The algorithm
If W[C] then DefectElse If S[C] then CooperateElse DefectWhere C= My opponent cooperates against me.
The last point is key here. Let’s suppose this game is being played by actual computers made of physical atoms. The Justbot and Prudentbot in the paper linked above consider what the opponent would do against a player that isn’t yourself. For example, Prudentbot imagines a hypothetical world where its own code was deleted and replaced with the code for a defect bot. The hypothetical opponent may notice this apparent anomaly and decide investigating it is far more important than the prisoners dilemma.
My bot (any good names in the comments?) solves this by only considering its opponent against itself.
Consider two versions of this bot playing against each other. Not necessarily using the same W and S as each other.
Lets assume all 4 proof systems in use are consistent. We also want the criteria that none of the proof systems can ever prove a false statement. (Although I suspect this can be weakened somewhat.)
Neither bot ever gets the suckers payoff (C,D), as each bot will only cooperate if its S can prove that the other bot does too. If W[C] on either bot, then this would mean the other bot got the suckers payoff. This can’t happen. As neither W actually succeeds, if both of the S are sufficiently strong enough to prove this ( S1[¬W2[C1]] and S2[¬W1[C2]] ) (Note: some technical detail work on when exactly this is true, still I expect it to be true for most sensible choices of proof system) then the situation reduces to fairbot vs fairbot (Using proof systems S1,S2 ) This cooperates for the standard Lobian reasons. (With some technical assumptions about both systems being strong enough to do lob style reasoning. See bottom of page 8 of linked paper)
Fairbot can be considered as a special case of this bot design, where W is a proof system so weak it doesn’t prove anything at all. In this case, fairbot will cooperate if its using a strong enough proof system.
Cooperate bot can be proven to cooperate by W , so this bot defects.
A final note is that the same type of reasoning can be done where W and S represent proof searches in the same language (say PA) but with S having a much larger max proof length. (Long enough to prove that W doesn’t prove something by brute forcing over all proofs short enough.)
Open question? Can this work with logical inductors. Two logical inductor based agents that cooperate if their P(C) tends to one, but not too fast.