I don’t think the ‘strategy’ used here (set to 99 degrees unless someone defects, then set to 100) satisfies the “individual rationality condition”. Sure, when everyone is setting it to 99 degrees, it beats the minmax strategy of choosing 30. But once someone chooses 30, the minmax for everyone else is now to also choose 30 - there’s no further punishment that will or could be given. So the behavior described here, where everyone punishes the 30, is worse than minmaxing. At the very least, it would be an unstable equilibrium that would have broken down in the situation described—and knowing that would give everyone an incentive to ‘defect’ immediately.
After someone chooses 30 once, they still get to choose something different in future rounds. In the strategy profile I claim is a Nash equilibrium, they’ll set it to 100 next round like everyone else. If anyone individually deviates from setting it to 100, then the equilibrium temperature in the next round will also be 100. That simply isn’t worth it, if you expect to be the only person setting it less than 100. Since in the strategy profile I am constructing everyone does set it to 100, that’s the condition we need to check to check whether it’s a Nash equilibrium.
I guess the unstated assumption is that the prisoners can only see the temperatures of others from the previous round and/or can only change their temperature at the start of a round (though one tried to do otherwise in the story). Even with that it seems like an awfully precarious equilibrium since if I unilaterally start choosing 30 repeatedly, you’d have to be stupid to not also start choosing 30, and the cost to me is really quite tiny even while no one else ever ‘defects’ alongside me. It seems to be too weak a definition of ‘equilibrium’ if it’s that easy to break—maybe there’s a more realistic definition that excludes this case?
The other thing that could happen is silent deviations, where some players aren’t doing “punish any defection from 99”—they are just doing “play 99″ to avoid punishments. The one brave soul doesn’t know how many of each there are, but can find out when they suddenly go for 30.
The ‘individual rationality condition’ is about the payoffs in equilibrium, not about the strategies. It says that the equilibrium payoff profile must yield to each player at least their minmax payoff. Here, the minmax payoff for a given player is −99.3 (which comes from the player best responding with 30 forever to everyone else setting their dials to 100 forever). The equilibrium payoff is −99 (which comes from everyone setting their dials to 99 forever). Since −99 > −99.3, the individual rationality condition of the Folk Theorem is satisfied.
I think the “at least” is an important part of this. If it yields more than their minimax payoff, either because the opponents are making mistakes, or have different payoffs than you think, or are just cruelly trying to break your model, there’s no debt created because there’s no cost to recoup.
The minimax expectation is 99.3 (the player sets to 30 and everyone else to 100). One possible bargaining/long-term repeated equilibrium is 99, where everyone chooses 99, and punishes anyone who sets to 100 by setting themselves to 100 for some time. But it would be just as valid to expect the long-term equilibrium to be 30, and punish anyone who sets to 31 or higher. I couldn’t tell from the paper how much communication was allowed between players, but it seems to assume some mutual knowledge of each other’s utility and what a given level of “punishment” achieves.
In no case do you need to punish someone who’s unilaterally giving you BETTER than your long-term equilibrium expectation.
Oh yeah, the Folk Theorem is totally consistent with the Nash equilibrium of the repeated game here being ‘everyone plays 30 forever’, since the payoff profile ‘-30 for everyone’ is feasible and individually-rational. In fact, this is the unique NE of the stage game and also the unique subgame-perfect NE of any finitely repeated version of the game.
To sustain ‘-30 for everyone forever’, I don’t even need a punishment for off-equilibrium deviations. The strategy for everyone can just be ‘unconditionally play 30 forever’ and there is no profitable unilateral deviation for anyone here.
The relevant Folk Theorem here just says that any feasible and individually-rational payoff profile in the stage game (i.e. setting dials at a given time) is a Nash equilibrium payoff profile in the infinitely repeated game. Here, that’s everything in the interval [-99.3, −30] for a given player. The theorem itself doesn’t really help constrain our expectations about which of the possible Nash equilibria will in fact be played in the game.
I don’t think the ‘strategy’ used here (set to 99 degrees unless someone defects, then set to 100) satisfies the “individual rationality condition”. Sure, when everyone is setting it to 99 degrees, it beats the minmax strategy of choosing 30. But once someone chooses 30, the minmax for everyone else is now to also choose 30 - there’s no further punishment that will or could be given. So the behavior described here, where everyone punishes the 30, is worse than minmaxing. At the very least, it would be an unstable equilibrium that would have broken down in the situation described—and knowing that would give everyone an incentive to ‘defect’ immediately.
After someone chooses 30 once, they still get to choose something different in future rounds. In the strategy profile I claim is a Nash equilibrium, they’ll set it to 100 next round like everyone else. If anyone individually deviates from setting it to 100, then the equilibrium temperature in the next round will also be 100. That simply isn’t worth it, if you expect to be the only person setting it less than 100. Since in the strategy profile I am constructing everyone does set it to 100, that’s the condition we need to check to check whether it’s a Nash equilibrium.
I guess the unstated assumption is that the prisoners can only see the temperatures of others from the previous round and/or can only change their temperature at the start of a round (though one tried to do otherwise in the story). Even with that it seems like an awfully precarious equilibrium since if I unilaterally start choosing 30 repeatedly, you’d have to be stupid to not also start choosing 30, and the cost to me is really quite tiny even while no one else ever ‘defects’ alongside me. It seems to be too weak a definition of ‘equilibrium’ if it’s that easy to break—maybe there’s a more realistic definition that excludes this case?
The other thing that could happen is silent deviations, where some players aren’t doing “punish any defection from 99”—they are just doing “play 99″ to avoid punishments. The one brave soul doesn’t know how many of each there are, but can find out when they suddenly go for 30.
The ‘individual rationality condition’ is about the payoffs in equilibrium, not about the strategies. It says that the equilibrium payoff profile must yield to each player at least their minmax payoff. Here, the minmax payoff for a given player is −99.3 (which comes from the player best responding with 30 forever to everyone else setting their dials to 100 forever). The equilibrium payoff is −99 (which comes from everyone setting their dials to 99 forever). Since −99 > −99.3, the individual rationality condition of the Folk Theorem is satisfied.
I think the “at least” is an important part of this. If it yields more than their minimax payoff, either because the opponents are making mistakes, or have different payoffs than you think, or are just cruelly trying to break your model, there’s no debt created because there’s no cost to recoup.
The minimax expectation is 99.3 (the player sets to 30 and everyone else to 100). One possible bargaining/long-term repeated equilibrium is 99, where everyone chooses 99, and punishes anyone who sets to 100 by setting themselves to 100 for some time. But it would be just as valid to expect the long-term equilibrium to be 30, and punish anyone who sets to 31 or higher. I couldn’t tell from the paper how much communication was allowed between players, but it seems to assume some mutual knowledge of each other’s utility and what a given level of “punishment” achieves.
In no case do you need to punish someone who’s unilaterally giving you BETTER than your long-term equilibrium expectation.
Oh yeah, the Folk Theorem is totally consistent with the Nash equilibrium of the repeated game here being ‘everyone plays 30 forever’, since the payoff profile ‘-30 for everyone’ is feasible and individually-rational. In fact, this is the unique NE of the stage game and also the unique subgame-perfect NE of any finitely repeated version of the game.
To sustain ‘-30 for everyone forever’, I don’t even need a punishment for off-equilibrium deviations. The strategy for everyone can just be ‘unconditionally play 30 forever’ and there is no profitable unilateral deviation for anyone here.
The relevant Folk Theorem here just says that any feasible and individually-rational payoff profile in the stage game (i.e. setting dials at a given time) is a Nash equilibrium payoff profile in the infinitely repeated game. Here, that’s everything in the interval [-99.3, −30] for a given player. The theorem itself doesn’t really help constrain our expectations about which of the possible Nash equilibria will in fact be played in the game.