DirectedEvolution comments on Most Prisoner’s Dilemmas are Stag Hunts; Most Stag Hunts are Schelling Problems

DirectedEvolution 26 Dec 2021 2:52 UTC
17 points
The goal of this post is to help us understand the similarities and differences between several different games, and to improve our intuitions about which game is the right default assumption when modeling real-world outcomes.
My main objective with this review is to check the game theoretic claims, identify the points at which this post makes empirical assertions, and see if there are any worrisome oversights or gaps. Most of my fact-checking will just be resorting to Wikipedia.
Let’s start with definitions of two key concepts.
Pareto-optimal: One dimension cannot improve without a second worsening.
Nash equilibrium: No player can do better by unilaterally changing their strategy.
Here’s the payoff matrix from the one-shot Prisoner’s Dilemma and how it relates to these key concepts.
B stays silent B betrays
A stays silent Pareto-optimal
A betrays Nash equilibrium
This article outlines three possible relationships between Pareto-optimality and Nash equilibrium.
1. There are no Pareto-optimal Nash equilibria.
2. There is a single Pareto-optimal Nash equilibrium, and another equilibrium that is not Pareto-optimal.
3. There are multiple Pareto-optimal Nash equilibria, which benefit different players to different extents.
The author attempts to argue which of these arrangements best describes the world we live in, and makes the best default assumption when interpreting real-world situations as games. The claim is that real-world situations most often resemble iterated PDs, which have multiple Pareto-optimal Nash equilibria benefitting different players to different extents. I will attempt to show that the author’s conclusion only applies when modeling superrational entities, or entities with an unbounded lifespan, and give some examples where this might be relevant.
Iterated Prisoner’s Dilemma is a little more complex than the author states. If the players know how many turns the game will be played for, or if the game has a known upper limit of turns, the Nash equilibrium is always to defect. However, if the players are superrational, meaning that they not only are perfectly rational but also assume all other players are too and that superrational players always converge on the same strategy, then they’ll always cooperate.
As such, the Nash equilibrium for rational, but not superrational, players for games with fixed or upper-bounded N is the same as for the single-shot game. In real life, any game played between human beings that takes a non-zero amount of time has an upper bound on the number of turns, given that we currently must expect ourselves to die. Therefore, game theory suggests that the Nash equilibrium strategy for all iterated Prisoner’s Dilemmas between rational players is defect/defect. Therefore, the claims about iterated PD in steps 2-5 in the author’s argument summary only seems to hold if we are talking about non-human entities with unbounded life expectancies, or if humans are modeled as superrational agents.
Let’s gesture at some plausible but extremely speculative real-world examples of how games with an unbounded upper limit of turns or superrationality might be reasonable models for human games.
Social entities, such as governments, corporations, and cultures, if they can be modeled as agents, could be seen as having unbounded lifespans. When we make strict game-theoretic arguments, we can be equally strict in mathematical assumptions about other facets of reality. If understanding of physics is imperfect, and if there is a non zero possibility of some continuity of agents not just into the distant future, but infinitely into time, then they could be modeled as having unbounded lifespans. If this holds, and if social entities are causally responsible for the way that most games unfold, then this may rescue the argument.
A second angle on this idea is that humans may have a psychological tendency to interpret long periods of time as equivalent to infinite in length, analogously to how people intuitively round low probabilities to 0 and high probabilities to 1. This suggests that studies investigating strategies empirically chosen by people playing iterated PDs in the lab would increasingly approach optimal strategies for unbounded iterated PDs as the turn count increases. A complicating factor here is that we must disambiguate whether the assumption is that humans “round” long lengths of time or large numbers of turns to infinity.
Outcomes for superrational agents are better than those for rational agents in the turn-bounded iterated PD. Imagine a large set of agents, each playing a randomly-chosen strategy. Some of those strategies may match the superrational strategy in iterated PD. If they face selection pressure over time, with some spontaneous generation of new agents, agents playing the superrational strategy may eventually dominate the population of agents. Conceivably, humans, and our social ancestor species, may have been genetically hardwired via group selection to adopt superrational strategies by instinct. This aligns with the psychological explanation.
Edit: Vanessa Kosoy pointed out below that iterated PDs with a finite but unknown number of iterations, or slightly noisy agents, can have Nash equilibria involving cooperation. We therefore don’t need to resort to such exotic explanations as I’ve offered here to explain how abramdemski’s arguments 2-5 hold, and we don’t need to “trick ourselves into cooperation” in such scenarios.
If this argument holds water, how does it affect the original agenda of this article, which was to inform our intuitions about how to model real-world games? It suggests that these twin questions, of psychological “rounding” and a possible group-selection account of how this might have evolved, would be important to investigate to increase our confidence in this heuristic. If true, it also suggests vulnerabilities in normal human approaches to games. Our ability to cooperate, under this hypothesis, depends on our ability to trick ourselves into cooperation by conveniently ignoring the inevitable end of our games.
The local argument made by this post needn’t be true for its conclusion to be true, and if the post seems plausible because it aligns with our real-world experience, we might want to appreciate the article for the conclusion it articulates as well as the argument it makes in support of that conclusion. The conclusion is that in most situations, we have multiple Pareto-optimal Nash equilibria, favoring different agents. Colloquially, many human problems are about fairness and resource allocation, and the threats and strategies people use to steer negotiation toward the outcome that favors them the most, while still achieving a fundamentally cooperative outcome.
This seems to me like an articulate, usefully predictive, simple, and realistic depiction of an enormous number of fundamental challenges in human organization. Although I don’t think that the original post’s game-theoretic argument is airtight, I think its psychological and sociological plausibility in conjunction with a tweaked game-theoretic argument makes it worthwhile and interesting. I also appreciate the care the author took to summarize, update, and respond to comments. Pointing out similarities and differences in the relationship between Nash equilibria and Pareto optimality in the various games also helped me understand them better, which I appreciate.
What links here?
- Vaniver's comment on 2020 Review: The Discussion Phase by Vaniver (1 Jan 2022 18:51 UTC; 4 points)
- Vanessa Kosoy 16 Jan 2022 11:18 UTC
  10 points
  Parent
  Cooperation can be a Nash equilibrium in the IPD if you have a finite but unknown number of iterations (e.g. geometrically distributed). Also, if the number of iterations is known but very large, cooperating becomes an $ϵ$ -Nash equilibrium for small $ϵ$ (if we normalize utility by its maximal value), so agents which are not superrational but a little noisy can still converge there (and, agents are sometimes noisy by design in order to facilitate exploration).
  - DirectedEvolution 16 Jan 2022 15:48 UTC
    4 points
    Parent
    Thank you for pointing this out. Here’s a source for the first claim.
    
    Finitely repeated games with an unknown or indeterminate number of time periods, on the other hand, are regarded as if they were an infinitely repeated game. It is not possible to apply backward induction to these games.
    
    And here’s a source that at least provides a starting point for the second claim about ϵ-Nash equilibria.
    Given a game and a real non-negative parameter ϵ, a strategy profile is said to be an ϵ-equilibrium if it is not possible for any player to gain more than ϵ in expected payoff by unilaterally deviating from his strategy. Every Nash Equilibrium is equivalent to an ϵ-equilibrium where ϵ = 0.
    Another simple example is the finitely repeated prisoner’s dilemma for T periods, where the payoff is averaged over the T periods. The only Nash equilibrium of this game is to choose Defect in each period. Now consider the two strategies tit-for-tat and grim trigger. Although neither tit-for-tat nor grim trigger are Nash equilibria for the game, both of them are ϵ-equilibria for some positive ϵ. The acceptable values of ϵ depend on the payoffs of the constituent game and on the number T of periods.

	B stays silent	B betrays
A stays silent	Pareto-optimal
A betrays		Nash equilibrium