In the previous “population game” setting, we assumed all players are “born” at the same time and learn synchronously, so that they always play against players of the same “age” (history length). Instead, we can consider a “mortal population game” setting where each player has a probability 1−γ to die on every round, and new players are born to replenish the dead. So, if the size of the population is N (we always consider the “thermodynamic” N→∞ limit), N(1−γ) players die and the same number of players are born on every round. Each player’s utility function is a simple sum of rewards over time, so, taking mortality into account, effectively ey have geometric time discount. (We could use age-dependent mortality rates to get different discount shapes, or allow each type of player to have different mortality=discount rate.) Crucially, we group the players into games randomly, independent of age.
As before, each player type i chooses a policy πi:On→ΔAi. (We can also consider the case where players of the same type may have different policies, but let’s keep it simple for now.) In the thermodynamic limit, the population is described as a distribution over histories, which now are allowed to be of variable length: μn∈ΔO∗. For each assignment of policies to player types, we get dynamics μn+1=Tπ(μn) where Tπ:ΔO∗→ΔO∗. So, as opposed to immortal population games, mortal population games naturally give rise to dynamical systems.
If we consider only the age distribution, then its evolution doesn’t depend on π and it always converges to the unique fixed point distribution ζ(k)=(1−γ)γk. Therefore it is natural to restrict the dynamics to the subspace of ΔO∗ that corresponds to the age distribution ζ. We denote it P.
Does the dynamics have fixed points?O∗ can be regarded as a subspace of (O⊔{⊥})ω. The latter is compact (in the product topology) by Tychonoff’s theorem and Polish, but O∗ is not closed. So, w.r.t. the weak topology on probability measure spaces, Δ(O⊔{⊥})ω is also compact but ΔO∗ isn’t. However, it is easy to see that Pis closed in Δ(O⊔{⊥})ω and therefore compact. It may also be regarded as a convex subset of an appropriate Banach space (the dual of the space of Lipschitz functions on some metrization of (O⊔{⊥})ω). Moreover, it is easy to see Tπ is continuous (for populations that are close in the Kantorovich-Rubinstein metric, only the old players may have very different distributions, but old players are a small fraction of the population so their effect on the next round is small). By the Schauder fixed-point theorem, it follows that Tπ has a fixed point.
What are the fixed points like? Of course it depends on π. In a fixed point, every player observes a sequence of IID plays in all of eir games. Therefore, if π satisfies the (very mild!) learning-theoretic desideratum that, upon observing an IID sequence, it converges to optimal response in the γ→1 limit, then, in the same limit, fixed points are Nash equilibria. This works even for extremely simple learning algorithms, such as “assume the plays in the next game will be sampled from a random past game”, and it works for any Bayesian or “quasi-Bayesian” (i.e. using incomplete/fuzzy models) agent that includes all IID processes in its prior.
This raises a range of interesting questions:
Are any/all of the fixed points attractors?
Does convergence to a fixed point occur for all or at least almost all initial conditions?
Do all Nash equilibria correspond to fixed points?
Do stronger game theoretic solution concepts (e.g. proper equilibria) have corresponding dynamical properties?
Mortal population games are obviously reminiscent of evolutionary game theory. However, there are substantial differences. In mortal population games, the game doesn’t have to be symmetric, we consider a single policy rather than many competing policies, the policies learn from experience instead of corresponding to fixed strategies, and mortality rate doesn’t depend on the reward. In evolutionary game theory, convergence usually cannot be guaranteed. For example, in the rock-scissors-paper game, the population may cycle among the different strategies. On the other hand, in mortal population games, if the game is two-player zero-sum (which includes rock-paper-scissors), and the policy is quasi-Bayesian with appropriate prior, convergence is guaranteed. This is because each player can easily learn to guarantee maximin payoff. Continuity arguments probably imply that at least for small perturbations of zero-sum, there will still be convergence. This leads to some hope that convergence can be guaranteed even in general games, or at least under some relatively mild conditions.
In the previous “population game” setting, we assumed all players are “born” at the same time and learn synchronously, so that they always play against players of the same “age” (history length). Instead, we can consider a “mortal population game” setting where each player has a probability 1−γ to die on every round, and new players are born to replenish the dead. So, if the size of the population is N (we always consider the “thermodynamic” N→∞ limit), N(1−γ) players die and the same number of players are born on every round. Each player’s utility function is a simple sum of rewards over time, so, taking mortality into account, effectively ey have geometric time discount. (We could use age-dependent mortality rates to get different discount shapes, or allow each type of player to have different mortality=discount rate.) Crucially, we group the players into games randomly, independent of age.
As before, each player type i chooses a policy πi:On→ΔAi. (We can also consider the case where players of the same type may have different policies, but let’s keep it simple for now.) In the thermodynamic limit, the population is described as a distribution over histories, which now are allowed to be of variable length: μn∈ΔO∗. For each assignment of policies to player types, we get dynamics μn+1=Tπ(μn) where Tπ:ΔO∗→ΔO∗. So, as opposed to immortal population games, mortal population games naturally give rise to dynamical systems.
If we consider only the age distribution, then its evolution doesn’t depend on π and it always converges to the unique fixed point distribution ζ(k)=(1−γ)γk. Therefore it is natural to restrict the dynamics to the subspace of ΔO∗ that corresponds to the age distribution ζ. We denote it P.
Does the dynamics have fixed points?O∗ can be regarded as a subspace of (O⊔{⊥})ω. The latter is compact (in the product topology) by Tychonoff’s theorem and Polish, but O∗ is not closed. So, w.r.t. the weak topology on probability measure spaces, Δ(O⊔{⊥})ω is also compact but ΔO∗ isn’t. However, it is easy to see that P is closed in Δ(O⊔{⊥})ω and therefore compact. It may also be regarded as a convex subset of an appropriate Banach space (the dual of the space of Lipschitz functions on some metrization of (O⊔{⊥})ω). Moreover, it is easy to see Tπ is continuous (for populations that are close in the Kantorovich-Rubinstein metric, only the old players may have very different distributions, but old players are a small fraction of the population so their effect on the next round is small). By the Schauder fixed-point theorem, it follows that Tπ has a fixed point.
What are the fixed points like? Of course it depends on π. In a fixed point, every player observes a sequence of IID plays in all of eir games. Therefore, if π satisfies the (very mild!) learning-theoretic desideratum that, upon observing an IID sequence, it converges to optimal response in the γ→1 limit, then, in the same limit, fixed points are Nash equilibria. This works even for extremely simple learning algorithms, such as “assume the plays in the next game will be sampled from a random past game”, and it works for any Bayesian or “quasi-Bayesian” (i.e. using incomplete/fuzzy models) agent that includes all IID processes in its prior.
This raises a range of interesting questions:
Are any/all of the fixed points attractors?
Does convergence to a fixed point occur for all or at least almost all initial conditions?
Do all Nash equilibria correspond to fixed points?
Do stronger game theoretic solution concepts (e.g. proper equilibria) have corresponding dynamical properties?
Mortal population games are obviously reminiscent of evolutionary game theory. However, there are substantial differences. In mortal population games, the game doesn’t have to be symmetric, we consider a single policy rather than many competing policies, the policies learn from experience instead of corresponding to fixed strategies, and mortality rate doesn’t depend on the reward. In evolutionary game theory, convergence usually cannot be guaranteed. For example, in the rock-scissors-paper game, the population may cycle among the different strategies. On the other hand, in mortal population games, if the game is two-player zero-sum (which includes rock-paper-scissors), and the policy is quasi-Bayesian with appropriate prior, convergence is guaranteed. This is because each player can easily learn to guarantee maximin payoff. Continuity arguments probably imply that at least for small perturbations of zero-sum, there will still be convergence. This leads to some hope that convergence can be guaranteed even in general games, or at least under some relatively mild conditions.