Aumann in Correlated Equilibrium as an Expression of Bayesian Rationality developed a formalism to reduce nashian game theory to bayesian decision making in a multi-agent setting and proved within that formalism that, under the conditions of common knowledge of “rationality” (read: common knowledge that every agent is running CDT) and that every agent knows his own action, the result will be a correlated equilibrium. As it turns out it is relatively straightforward to extend this framework to EDT and UDT like agents, which will be the goal of the rest of this post.
We will be using the same notation as Aumann: Ω will stand for the space of all possible worlds (which we’ll assume finite), p is a probability distribution on Ω, Si is the set of all possible actions for player i, si is a function from Ω to si returning which action player i chooses in a given world, Pi is a partition of Ω representing the possible information states for player i and finally hi is a function from ∏iSi to R representing the utility function of player i.
First we can ask what happens if we relax the condition that the players knows their own action, allowing for players acting in non-deterministic ways. Without any other conditions this would allow the actions of players to be correlated in arbitrary ways without any source of information on their respective strategies. For example, suppose we have two agents, 1 and 2, playing rock paper scissor with both information partitions being trivial, we could have Ω={1,2,3} with p(1)=p(2)=p(3)=1/3 and s1(1)=Rock, s2(1)=Paper, s1(2)=Paper, s2(2)=Scissor, s1(3)=Scissor, s2(3)=Rock. Then both agents verify the CDT conditions but player 2 seems to always be able to predict the action of player 1 and playing the counter to their action, despite their information partition being trivial. To avoid that situation we can require the following condition, which we shall call the strategic independence condition:
SI Given Pi∈Pi for all player i then the probability distributions on strategies for all players are independent when conditioned on ⋂Pi
It might not necessarily always be correct to demand that condition, think for example of a case where we have two identical agents playing a symmetric game and thus always making the same move when they have the same information, but generally it’s a pretty reasonable condition.
Now even with condition SI we still obtain more equilibria than just the correlated equilibria, because an agent can obtain information directly about their opponents action. For example, suppose we have two agents playing rock paper scissor, with agent 1 having a trivial information partition but player 2 having an information partition with three elements each containing one world which correspond with each of the three possible action of player 1, in each of these world player 1 perform the corresponding action, player 2 perform the one that beats it and each of these worlds have a probability of 1⁄3. Then it is easy to see that each agent verify the condition to be a CDT agent and the condition SI is verified. But this is clearly not a correlated equilibrium.
Now lets look at relaxing the first condition and lets define EDT and UDT agents.
Definition: An agent i is an EDT agent in world ω∈Ω with ω∈P∈Pi iff ∀s∈SiE(hi(s)|P,si=s)≤E(hi(s)|P,si=si(ω))
In order to define a UDT agent we will need a notion of type. A type for player i is a function from Pi to Si representing what action the agent intend to do depending on what information they obtain. We will of course require that if an agent i is of type τ in world ω∈P∈Pi then si(ω)=τ(P). Another condition we can impose is that the probability distribution on types be independent of which P∈Pi actually obtain, we shall call this the Type Consistency condition TC. We’ll denote the type of agent i in the world ω∈Ω by τi(ω). We can then define a UDT agent as follow
Definition: An agent i is a UDT agent in world ω∈Ω iff ∀t:Pi→SiE(hi(s)|τi=t)≤E(hi(s)|τi=τi(ω))
Let’s introduce the concept of a transparency function. A deterministic transparency function ti for player i is a function from the product of the type spaces of all other players to Pi such that p(ti(τ−i)|τ−i)=1. A stochastic transparency function gives the probability distribution over the P∈Pi conditioned on the types of the other players. These functions essentially allows us to model situations where the information partition of one player gives information about the types of other players.
If we look at the prisoner’s dilemma it’s obvious that a CDT agent will always defect, but this is not the case for an EDT or UDT agent. For example, consider the case where both players have an information partition with two elements and a deterministic transparency function which assign the type which cooperate in the first element of the partition function and defect in the second to the first element of the information partition and all other types to the second. Then, if all agents verify the following condition:
PER(p,i) P(i is an EDT agent|Pi)≥p for all Pi∈Pi
in addition to TC, we will have for p sufficiently close to 1 mutual cooperation will happen in every world where both players are EDT agents. On the other hand, for UDT the situation is a little more complicated. If we just demand a similar condition for every agent
PUR(p,i) P(i is an UDT agent) ≥p
in addition to TC that is not sufficient to guarantee mutual cooperation even in the worlds where both agents are UDT agents. Instead we will assume that our prior have our agents acts like a thermally perturbed UDT:
Then if we assume this condition holds for all players for some λ sufficiently large, in addition to TC then mutual cooperation is guaranteed in any world where both agents are UDT agents.
Now let’s look at Chicken. Let’s suppose that we have for each player a partition function with two elements and a transparency function which sends the type of an opponent that always goes straight to one element of the partition function and all other types to the second element of the partition function, then if we assume that one of the agents is a CDT agent or obeys PER(p,1) while the other obeys PUR(p,2) for p sufficiently large and we assume TC then, with probability close to 1, the UDT agent will go straight while the opponent will swerve.
As further avenues of research, one would ideally want to show that EDT/UDT agents defined this way will cooperate with one another on the prisoner’s dilemma (with large probability, conditional on the appropriate conditions) if they have “enough information about their opponent’s type” in some appropriate sense. In the case of Chicken, we have seen that UDT agents, as defined in this formalism, can successfully exploit CDT and EDT agents with the right transparency function, but the formalism allows for a wide variety of possibilities when two UDT agents plays against one another, so one could ask what further reasonable conditions could be imposed to constraint the space of possible equilibria.
An extension of Aumann’s approach for reducing game theory to bayesian decision theory to include EDT and UDT like agents
Aumann in Correlated Equilibrium as an Expression of Bayesian Rationality developed a formalism to reduce nashian game theory to bayesian decision making in a multi-agent setting and proved within that formalism that, under the conditions of common knowledge of “rationality” (read: common knowledge that every agent is running CDT) and that every agent knows his own action, the result will be a correlated equilibrium. As it turns out it is relatively straightforward to extend this framework to EDT and UDT like agents, which will be the goal of the rest of this post.
We will be using the same notation as Aumann: Ω will stand for the space of all possible worlds (which we’ll assume finite), p is a probability distribution on Ω, Si is the set of all possible actions for player i, si is a function from Ω to si returning which action player i chooses in a given world, Pi is a partition of Ω representing the possible information states for player i and finally hi is a function from ∏iSi to R representing the utility function of player i.
First we can ask what happens if we relax the condition that the players knows their own action, allowing for players acting in non-deterministic ways. Without any other conditions this would allow the actions of players to be correlated in arbitrary ways without any source of information on their respective strategies. For example, suppose we have two agents, 1 and 2, playing rock paper scissor with both information partitions being trivial, we could have Ω={1,2,3} with p(1)=p(2)=p(3)=1/3 and s1(1)=Rock, s2(1)=Paper, s1(2)=Paper, s2(2)=Scissor, s1(3)=Scissor, s2(3)=Rock. Then both agents verify the CDT conditions but player 2 seems to always be able to predict the action of player 1 and playing the counter to their action, despite their information partition being trivial. To avoid that situation we can require the following condition, which we shall call the strategic independence condition:
SI Given Pi∈Pi for all player i then the probability distributions on strategies for all players are independent when conditioned on ⋂Pi
It might not necessarily always be correct to demand that condition, think for example of a case where we have two identical agents playing a symmetric game and thus always making the same move when they have the same information, but generally it’s a pretty reasonable condition.
Now even with condition SI we still obtain more equilibria than just the correlated equilibria, because an agent can obtain information directly about their opponents action. For example, suppose we have two agents playing rock paper scissor, with agent 1 having a trivial information partition but player 2 having an information partition with three elements each containing one world which correspond with each of the three possible action of player 1, in each of these world player 1 perform the corresponding action, player 2 perform the one that beats it and each of these worlds have a probability of 1⁄3. Then it is easy to see that each agent verify the condition to be a CDT agent and the condition SI is verified. But this is clearly not a correlated equilibrium.
Now lets look at relaxing the first condition and lets define EDT and UDT agents.
Definition: An agent i is an EDT agent in world ω∈Ω with ω∈P∈Pi iff ∀s∈Si E(hi(s)|P,si=s)≤E(hi(s)|P,si=si(ω))
In order to define a UDT agent we will need a notion of type. A type for player i is a function from Pi to Si representing what action the agent intend to do depending on what information they obtain. We will of course require that if an agent i is of type τ in world ω∈P∈Pi then si(ω)=τ(P). Another condition we can impose is that the probability distribution on types be independent of which P∈Pi actually obtain, we shall call this the Type Consistency condition TC. We’ll denote the type of agent i in the world ω∈Ω by τi(ω). We can then define a UDT agent as follow
Definition: An agent i is a UDT agent in world ω∈Ω iff ∀t:Pi→Si E(hi(s)|τi=t)≤E(hi(s)|τi=τi(ω))
Let’s introduce the concept of a transparency function. A deterministic transparency function ti for player i is a function from the product of the type spaces of all other players to Pi such that p(ti(τ−i)|τ−i)=1. A stochastic transparency function gives the probability distribution over the P∈Pi conditioned on the types of the other players. These functions essentially allows us to model situations where the information partition of one player gives information about the types of other players.
If we look at the prisoner’s dilemma it’s obvious that a CDT agent will always defect, but this is not the case for an EDT or UDT agent. For example, consider the case where both players have an information partition with two elements and a deterministic transparency function which assign the type which cooperate in the first element of the partition function and defect in the second to the first element of the information partition and all other types to the second. Then, if all agents verify the following condition:
PER(p,i) P(i is an EDT agent|Pi)≥p for all Pi∈Pi
in addition to TC, we will have for p sufficiently close to 1 mutual cooperation will happen in every world where both players are EDT agents. On the other hand, for UDT the situation is a little more complicated. If we just demand a similar condition for every agent
PUR(p,i) P(i is an UDT agent) ≥p
in addition to TC that is not sufficient to guarantee mutual cooperation even in the worlds where both agents are UDT agents. Instead we will assume that our prior have our agents acts like a thermally perturbed UDT:
TUR(λ,i) ∀t:Pi→Si P(t)=eλE(hi(s)|t)∑τ:Pi→SieλE(hi(s)|τ)
Then if we assume this condition holds for all players for some λ sufficiently large, in addition to TC then mutual cooperation is guaranteed in any world where both agents are UDT agents.
Now let’s look at Chicken. Let’s suppose that we have for each player a partition function with two elements and a transparency function which sends the type of an opponent that always goes straight to one element of the partition function and all other types to the second element of the partition function, then if we assume that one of the agents is a CDT agent or obeys PER(p,1) while the other obeys PUR(p,2) for p sufficiently large and we assume TC then, with probability close to 1, the UDT agent will go straight while the opponent will swerve.
As further avenues of research, one would ideally want to show that EDT/UDT agents defined this way will cooperate with one another on the prisoner’s dilemma (with large probability, conditional on the appropriate conditions) if they have “enough information about their opponent’s type” in some appropriate sense. In the case of Chicken, we have seen that UDT agents, as defined in this formalism, can successfully exploit CDT and EDT agents with the right transparency function, but the formalism allows for a wide variety of possibilities when two UDT agents plays against one another, so one could ask what further reasonable conditions could be imposed to constraint the space of possible equilibria.