I should have mentioned that I was talking about a 1v1 equilibrium of strategies that maximizes your expected score.
DefectBot is such an equilibrium strategy for the vanilla iterated-PD with fixed time horizon, but not if you allow simulation.
DefectBot maximises expected score against DefectBot, because you can’t do better vs DefectBot than defect every round and get 100 points. As such, DefectBot vs DefectBot is still a Nash equilibrium.
You need to be more specific as to what “maximizes your expected score” means, because depending on your definition I could come up with some very surprising strategies you might not be expecting.
By “score” I mean the sum of the payoffs that your bot receives during each round in a single 1v1 match. By “expected” I mean w.r.t. both the internal random source of each both and the random source you use to choose your bot if you are using a mixed strategy.
Unlike program-swap (I)PD, where these max-payoff equilibria are the CliqueBots equilibria, and there doesn’t seem to be any “natural” clique to pick as a Schelling point, in program-simulation IPD it seems that there is a Schelling point.
Consider this pair of bots: 1) ExtortionBot, who defects for 80 rounds, and then cooperates with you for 20 rounds if and only if you cooperated for all of the first 80 (otherwise it defects for those rounds as well). 2) WeakBot, who always defects for the last 20 rounds, and cooperates with you for the first 80 if and only if it simulates that you will cooperate for the last 20 rounds iff WeakBot cooperates with you for the first 80 (otherwise it defects for the first 80).
The maximum score you can get vs ExtortionBot is 100 points, which is how many points WeakBot gets. The maximum score you can get vs WeakBot is 400 points, which is how many points ExtortionBot gets.
Ergo ExtortionBot/WeakBot forms a Nash Equilibrium. Is that a max-payoff equilibrium, or is it not?
That means that this game is a symmetric bargaining problem. According to Wikipedia, proposed solutions are symmetric, Pareto-optimal (i.e. “max-payoff”) equilibria.
It seems to me that VOFB or something similar to it is a strategy leading to one of these equilibria (do other symmetric Pareto-optimal equilibria exist?)
I think there are many such equilibria, but they all rely on the same basic principle. I’ve clarified this in a top-level comment, along with a simple example of a symmetric Pareto-optimal equilibrium strategy.
DefectBot maximises expected score against DefectBot, because you can’t do better vs DefectBot than defect every round and get 100 points. As such, DefectBot vs DefectBot is still a Nash equilibrium.
You need to be more specific as to what “maximizes your expected score” means, because depending on your definition I could come up with some very surprising strategies you might not be expecting.
By “score” I mean the sum of the payoffs that your bot receives during each round in a single 1v1 match.
By “expected” I mean w.r.t. both the internal random source of each both and the random source you use to choose your bot if you are using a mixed strategy.
DefectBot gets the maximum possible expected score of 100pts vs DefectBot—it’s not possible to do better vs DefectBot.
Yes, but there are other equilibria were both players get an higher score.
Yes, and which one of those equilibria do you pick?
That’s a coordination problem.
Unlike program-swap (I)PD, where these max-payoff equilibria are the CliqueBots equilibria, and there doesn’t seem to be any “natural” clique to pick as a Schelling point, in program-simulation IPD it seems that there is a Schelling point.
The term “max-payoff equilibrium” is ill-defined.
Consider this pair of bots:
1) ExtortionBot, who defects for 80 rounds, and then cooperates with you for 20 rounds if and only if you cooperated for all of the first 80 (otherwise it defects for those rounds as well).
2) WeakBot, who always defects for the last 20 rounds, and cooperates with you for the first 80 if and only if it simulates that you will cooperate for the last 20 rounds iff WeakBot cooperates with you for the first 80 (otherwise it defects for the first 80).
The maximum score you can get vs ExtortionBot is 100 points, which is how many points WeakBot gets.
The maximum score you can get vs WeakBot is 400 points, which is how many points ExtortionBot gets.
Ergo ExtortionBot/WeakBot forms a Nash Equilibrium. Is that a max-payoff equilibrium, or is it not?
That means that this game is a symmetric bargaining problem.
According to Wikipedia, proposed solutions are symmetric, Pareto-optimal (i.e. “max-payoff”) equilibria.
It seems to me that VOFB or something similar to it is a strategy leading to one of these equilibria (do other symmetric Pareto-optimal equilibria exist?)
I think there are many such equilibria, but they all rely on the same basic principle. I’ve clarified this in a top-level comment, along with a simple example of a symmetric Pareto-optimal equilibrium strategy.