Starting with a simpler problem, let’s say that I want to build a bot that cooperates with itself as well as possible and also tries to distribute points as evenly as possible between copies of itself. This is the best that I’ve come up with:
Always play either 2 (low) or 3 (high).
On the first turn, play 2 with probability p and 3 with probability 1-p.
If we got 5 points last turn, then play what my opponent played last turn. (This makes mutual cooperation continue forever, while alternating high-low.)
If we got less than 5 points last turn, then play 2 with probability q and 3 with probability 1-q.
If we got went over 5 last turn, then play 2 with probability r and 3 with probability 1-r.
Let’s call this fairbot(p,q,r).
If my calculations are correct, expected score in self-play is maximized with p=q=r=0.69 (to the nearest .01). On average this brings in a combined 2.24 total points less than the fair maximum (5 per turn) for the two bots, because it might take a few turns to coordinate on which copy plays 3 vs. 2. (The quickest coordination happens with p=q=r=0.5, but that would miss out on 3 points because half the coordination failures cost the whole pot; with more weight on 2 the coordination failure happens more often but usually only shrinks the pot by 1 point.)
Starting to think about how fairbot plays against other bots:
Fairbot never settles into a stable pattern against “always 3” or “always 2“; it continues to switch between 2 and 3 (and plays the same thing as its opponent more often than not). Presumably it would be better it stick with 3s against “always 2” and to either stick with 2s or stick with 3s against “always 3” (depending on whether I prioritize relative or absolute score).
If I knew that fairbot(0.69,0.69,0.69) was my opponent, I think I’d want start by playing 3s, and then settle into mutual alternating coordination once we got a pot of 5. I would do slightly better than this bot does against me or against itself. I could get a much larger relative score by playing always 3, but it would hurt my absolute score by a lot. Or possibly I could do something in between.
An obvious way to make fairbot(p,q,r) more aggressive / less exploitable is to reduce p, q, or especially r.
We can add other rules which cover situations that never arise in self-play, such as when the opponent plays something other than 2 or 3, or when 5-point mutual cooperation gets broken (a non-5 pot after the previous turn had a pot of 5).
It occurs to me that there are a bunch of different patterns that one could aim for to cooperate with oneself. Fairbot aims to play 2,3,2,3,2,3,… (or the complement of this: 3,2,3,2,3,2,...) so that each round has a pot of 5 and the two bots get approximately equal shares. But you could also aim to cooperate with yourself with the pattern 1,4,1,4,1,4,..., or 2,3,3,2,2,3,3,2,2,3,3,..., or 0,5,1,4,2,3,0,5,..., or whatever.
If you want to mutually cooperate with other bots, you either need to pick the Schelling pattern, or plan with the other humans to ensure that you pick the same pattern, or have a bot that’s smart enough to recognize which pattern they’re aiming for and then play it (though that will still leave you less efficient at cooperating than a bot which starts by aiming for that pattern).
A simple way to make a bot nearly inexploitable is to add a payback rule: if my opponent has outscored me thus far, then play 3. At worst, my opponent will outscore me by 1 point for this whole pairing.
This doesn’t interfere with self-cooperation, and it generally shouldn’t interfere with finding mutual fair coordination with bots that are seeking that.
I think I want bot A (master) in my other comment to have this feature.
I think I found a way to do slightly better than .69 for 2 and .31 for 3 when cooperating with yourself:
Play 1 with probability .12, 2 with .60, and 3 with .28. If you played the same number, repeat. If you played different numbers, the higher one can play a 3 and the lower one can play a 2 (or whichever cooperating pattern you choose), and they can start alternating from there.
If my calculations are correct, this only loses 2.10 points on average from the maximum possible, which seems to be the best you can do. The goal of the randomization is to pick a different number than your opponent so that you can start cooperating, so the reason this does better is that by picking out of 3 numbers, you have a higher change of not picking the same number.
Starting with a simpler problem, let’s say that I want to build a bot that cooperates with itself as well as possible and also tries to distribute points as evenly as possible between copies of itself. This is the best that I’ve come up with:
Always play either 2 (low) or 3 (high).
On the first turn, play 2 with probability p and 3 with probability 1-p.
If we got 5 points last turn, then play what my opponent played last turn. (This makes mutual cooperation continue forever, while alternating high-low.)
If we got less than 5 points last turn, then play 2 with probability q and 3 with probability 1-q.
If we got went over 5 last turn, then play 2 with probability r and 3 with probability 1-r.
Let’s call this fairbot(p,q,r).
If my calculations are correct, expected score in self-play is maximized with p=q=r=0.69 (to the nearest .01). On average this brings in a combined 2.24 total points less than the fair maximum (5 per turn) for the two bots, because it might take a few turns to coordinate on which copy plays 3 vs. 2. (The quickest coordination happens with p=q=r=0.5, but that would miss out on 3 points because half the coordination failures cost the whole pot; with more weight on 2 the coordination failure happens more often but usually only shrinks the pot by 1 point.)
Starting to think about how fairbot plays against other bots:
Fairbot never settles into a stable pattern against “always 3” or “always 2“; it continues to switch between 2 and 3 (and plays the same thing as its opponent more often than not). Presumably it would be better it stick with 3s against “always 2” and to either stick with 2s or stick with 3s against “always 3” (depending on whether I prioritize relative or absolute score).
If I knew that fairbot(0.69,0.69,0.69) was my opponent, I think I’d want start by playing 3s, and then settle into mutual alternating coordination once we got a pot of 5. I would do slightly better than this bot does against me or against itself. I could get a much larger relative score by playing always 3, but it would hurt my absolute score by a lot. Or possibly I could do something in between.
An obvious way to make fairbot(p,q,r) more aggressive / less exploitable is to reduce p, q, or especially r.
We can add other rules which cover situations that never arise in self-play, such as when the opponent plays something other than 2 or 3, or when 5-point mutual cooperation gets broken (a non-5 pot after the previous turn had a pot of 5).
It occurs to me that there are a bunch of different patterns that one could aim for to cooperate with oneself. Fairbot aims to play 2,3,2,3,2,3,… (or the complement of this: 3,2,3,2,3,2,...) so that each round has a pot of 5 and the two bots get approximately equal shares. But you could also aim to cooperate with yourself with the pattern 1,4,1,4,1,4,..., or 2,3,3,2,2,3,3,2,2,3,3,..., or 0,5,1,4,2,3,0,5,..., or whatever.
If you want to mutually cooperate with other bots, you either need to pick the Schelling pattern, or plan with the other humans to ensure that you pick the same pattern, or have a bot that’s smart enough to recognize which pattern they’re aiming for and then play it (though that will still leave you less efficient at cooperating than a bot which starts by aiming for that pattern).
A simple way to make a bot nearly inexploitable is to add a payback rule: if my opponent has outscored me thus far, then play 3. At worst, my opponent will outscore me by 1 point for this whole pairing.
This doesn’t interfere with self-cooperation, and it generally shouldn’t interfere with finding mutual fair coordination with bots that are seeking that.
I think I want bot A (master) in my other comment to have this feature.
I think I found a way to do slightly better than .69 for 2 and .31 for 3 when cooperating with yourself:
Play 1 with probability .12, 2 with .60, and 3 with .28. If you played the same number, repeat. If you played different numbers, the higher one can play a 3 and the lower one can play a 2 (or whichever cooperating pattern you choose), and they can start alternating from there.
If my calculations are correct, this only loses 2.10 points on average from the maximum possible, which seems to be the best you can do. The goal of the randomization is to pick a different number than your opponent so that you can start cooperating, so the reason this does better is that by picking out of 3 numbers, you have a higher change of not picking the same number.