All the methodsproposed for factoring out B (for having the AI maximise a certain value while ‘ignoring’ it’s impact via B) can be put on the same general footing. For some set A, define a function Q on A×A with Q(a,a′)≥0 and ∑a,a′Q(a,a′)=1.
Then for a utility u, the general requirements is for the AI to maximise the quantity:
EQ(u)=∑a,a′,bE(u|B=b,A=a)P(B=b|A=a′)Q(a,a′),
subject to some constraints on Q.
Let’s play a game
Define the two-player game G by allowing each player to have moves in A. The expected utility, for player 1, of the moves (a,a′) is defined to be ∑bE(u|B=b,A=a)P(B=b|A=a′). To completely define the game, we need the expected utility for the second player. We’ll just set that to ensure the game is symmetric the expected utility of player two for (a,a′) is the same as the expected utility for player one for (a′,a).
Then Q is effectively a probability distribution over the choice of possible moves for G. Standard symmetric games include the stag hunt, the prisoner’s dilemma, and the coordination game. For our purposes, since the actions for have meaningful labels, we’ll be considering skew coordination game given as follows:
+-----+-----+-----+
| | l | r |
+-----+-----+-----+
| l |(0,0)|(1,1)|
+-----+-----+-----+
| r |(1,1)|(0,0)|
+-----+-----+-----+
These ‘games’ will be mainly used to show that the various methods reach different solutions in different situations, hence that they are genuinely different methods.
Standard maximalisation
If we constrain Q(a,a′)=0 if a≠a′, then that equation becomes:
This makes Q(a,a) into the action distribution over A=a the AI must choose from to maximise u in the classical sense. In terms of G, this means the AI will make the superrational choice between the two players, if they lack any way of distinguishing themselves. Thus it will pick s,s for the stag hunt, c,c in the prisoner’s dilemma, and l,l or r,r (or some mixture between the two) in the skew coordination game.
Double variable maximalisation
First consider that there are no additional constraints on Q. The AI can pick the values to be whatever it wants. It therefore acts like a dictator in G, choosing both player’s outcomes. Under this, it will pick S,S for the stag hunt, D,C in the prisoner’s dilemma, L,R or R,L (or some mixture between the two) in the skew coordination game. Note that the AI can always achieve maximal utility by choosing a single action for both players. Thus if we added the restriction that Q(a,a′)=R(a)R′(a′), it could still reach at least one of these optimums.
What if we added the restriction Q(a,a′)=R(a)R(a′) for a single R? Maximising this over possible R is equivalent with the AI reaching the superrational decision for both players, where they have access to individual but not joint sources of randomness. Thus it will pick s,s for the stag hunt, c,c in the prisoner’s dilemma, and R(l)=R(r)=0.5 in the skew coordination game.
For the next case, consider that Q(a,a′)=R(a)R(a′), as above, but the AI is not aiming to maximise EQ(u), but to pick an R that ensures
ERR(u)≥ER′R(u) for all R′.
This is Chris and Jessica’s method. You can transform that requirement into a quantity to be maximised, but it’s not particularly illuminating. What’s more illuminating is that the R that solve this are precisely the symmetric Nash equilibriums to G (all symmetric games have at least one symmetric Nash equilibrium). Q=RR ensures they must be symmetric, the condition on picking R ensures the first player cannot improve their expectation unilaterally, and, since G itself is symmetric, the second player cannot improve their expectation unilaterally.
In cases where there are multiple symmetric Nash equilibriums, we may as well have the AI choose the one that maximises the ERR(u). Thus the AI will choose s,s for the stag hunt, d,d in the prisoner’s dilemma, and R(l)=R(r)=0.5 in the skew coordination game.
Single variable maximalisation
Here Q(a,a′)=R(a)R′(a′) where R′ is some fixed distribution. There are some obvious candidates for that—maybe one action is a default action, or R′ is uniform across all actions. There are some more complicatedmethods to assign probabilities in a way that is sensible if there are multiple branching decisions.
In this case, the AI will pick s or h for the stag hunt, depending on R′ and the exact rewards of the game, will always choose d in the prisoner’s dilemma, and will choose r if R′(r)>0.5 and l if R′(r)<0.5 in the skew coordination game.
Summary
There are thus five methods, clearly distinguished by making different choices in the different games:
Q(a,a′)=0 unless a=a′. EQ(u)=E(u) is maximalised.
There are no constraints on Q, or Q(a,a′)=R(a)R′(a′). EQ(u) is maximalised.
Q=R(a)R(a′) and EQ(u) is maximalised.
Q=R(a)R(a′) and ERR(u)≥ER′R(u) for all R′.
Q=R(a)R′(a′) for fixed R′ and EQ(u) is maximalised.
Arbitrarily ‘bad’ decisions
All the methods (except for the first one) can reach arbitrarily bad decision in terms of real expected utility, as compared with standard expected utility maximalisation. Consider the following extension of the skew coordination problem, for large W:
+-----+---------+---------+---------+
| | l | r | c |
+-----+---------+---------+---------+
| l | (-W,-W) |(W+1,W+1)| (0,0) |
+-----+---------+---------+---------+
| r |(W+1,W+1)| (-W,-W) | (0,0) |
+-----+---------+---------+---------+
| c | (0,0) | (0,0) | (-1,-1) |
+-----+---------+---------+---------+
All alternative methods will choose actions from among r and l only. This condemns them to a utility of −W, while the best action choice, c, has an expected utility of −1.
Further considerations
To distinguish which method we should be using, issues like stability, self-consistency, and other properties will probably be needed.
Games for factoring out variables
All the methods proposed for factoring out B (for having the AI maximise a certain value while ‘ignoring’ it’s impact via B) can be put on the same general footing. For some set A, define a function Q on A×A with Q(a,a′)≥0 and ∑a,a′Q(a,a′)=1.
Then for a utility u, the general requirements is for the AI to maximise the quantity:
EQ(u)=∑a,a′,bE(u|B=b,A=a)P(B=b|A=a′)Q(a,a′),
subject to some constraints on Q.
Let’s play a game
Define the two-player game G by allowing each player to have moves in A. The expected utility, for player 1, of the moves (a,a′) is defined to be ∑bE(u|B=b,A=a)P(B=b|A=a′). To completely define the game, we need the expected utility for the second player. We’ll just set that to ensure the game is symmetric the expected utility of player two for (a,a′) is the same as the expected utility for player one for (a′,a).
Then Q is effectively a probability distribution over the choice of possible moves for G. Standard symmetric games include the stag hunt, the prisoner’s dilemma, and the coordination game. For our purposes, since the actions for have meaningful labels, we’ll be considering skew coordination game given as follows:
+-----+-----+-----+ | | l | r | +-----+-----+-----+ | l |(0,0)|(1,1)| +-----+-----+-----+ | r |(1,1)|(0,0)| +-----+-----+-----+
These ‘games’ will be mainly used to show that the various methods reach different solutions in different situations, hence that they are genuinely different methods.
Standard maximalisation
If we constrain Q(a,a′)=0 if a≠a′, then that equation becomes:
∑a,a′,bE(u|B=b,A=a)P(B=b|A=a′)Q(a,a′)=∑a,bE(u|B=b,A=a)P(B=b|A=a)Q(a,a)=∑aE(u|A=a)Q(a,a).
This makes Q(a,a) into the action distribution over A=a the AI must choose from to maximise u in the classical sense. In terms of G, this means the AI will make the superrational choice between the two players, if they lack any way of distinguishing themselves. Thus it will pick s,s for the stag hunt, c,c in the prisoner’s dilemma, and l,l or r,r (or some mixture between the two) in the skew coordination game.
Double variable maximalisation
First consider that there are no additional constraints on Q. The AI can pick the values to be whatever it wants. It therefore acts like a dictator in G, choosing both player’s outcomes. Under this, it will pick S,S for the stag hunt, D,C in the prisoner’s dilemma, L,R or R,L (or some mixture between the two) in the skew coordination game. Note that the AI can always achieve maximal utility by choosing a single action for both players. Thus if we added the restriction that Q(a,a′)=R(a)R′(a′), it could still reach at least one of these optimums.
What if we added the restriction Q(a,a′)=R(a)R(a′) for a single R? Maximising this over possible R is equivalent with the AI reaching the superrational decision for both players, where they have access to individual but not joint sources of randomness. Thus it will pick s,s for the stag hunt, c,c in the prisoner’s dilemma, and R(l)=R(r)=0.5 in the skew coordination game.
For the next case, consider that Q(a,a′)=R(a)R(a′), as above, but the AI is not aiming to maximise EQ(u), but to pick an R that ensures
ERR(u)≥ER′R(u) for all R′.
This is Chris and Jessica’s method. You can transform that requirement into a quantity to be maximised, but it’s not particularly illuminating. What’s more illuminating is that the R that solve this are precisely the symmetric Nash equilibriums to G (all symmetric games have at least one symmetric Nash equilibrium). Q=RR ensures they must be symmetric, the condition on picking R ensures the first player cannot improve their expectation unilaterally, and, since G itself is symmetric, the second player cannot improve their expectation unilaterally.
In cases where there are multiple symmetric Nash equilibriums, we may as well have the AI choose the one that maximises the ERR(u). Thus the AI will choose s,s for the stag hunt, d,d in the prisoner’s dilemma, and R(l)=R(r)=0.5 in the skew coordination game.
Single variable maximalisation
Here Q(a,a′)=R(a)R′(a′) where R′ is some fixed distribution. There are some obvious candidates for that—maybe one action is a default action, or R′ is uniform across all actions. There are some more complicated methods to assign probabilities in a way that is sensible if there are multiple branching decisions.
In this case, the AI will pick s or h for the stag hunt, depending on R′ and the exact rewards of the game, will always choose d in the prisoner’s dilemma, and will choose r if R′(r)>0.5 and l if R′(r)<0.5 in the skew coordination game.
Summary
There are thus five methods, clearly distinguished by making different choices in the different games:
Q(a,a′)=0 unless a=a′. EQ(u)=E(u) is maximalised.
There are no constraints on Q, or Q(a,a′)=R(a)R′(a′). EQ(u) is maximalised.
Q=R(a)R(a′) and EQ(u) is maximalised.
Q=R(a)R(a′) and ERR(u)≥ER′R(u) for all R′.
Q=R(a)R′(a′) for fixed R′ and EQ(u) is maximalised.
Arbitrarily ‘bad’ decisions
All the methods (except for the first one) can reach arbitrarily bad decision in terms of real expected utility, as compared with standard expected utility maximalisation. Consider the following extension of the skew coordination problem, for large W:
+-----+---------+---------+---------+ | | l | r | c | +-----+---------+---------+---------+ | l | (-W,-W) |(W+1,W+1)| (0,0) | +-----+---------+---------+---------+ | r |(W+1,W+1)| (-W,-W) | (0,0) | +-----+---------+---------+---------+ | c | (0,0) | (0,0) | (-1,-1) | +-----+---------+---------+---------+
All alternative methods will choose actions from among r and l only. This condemns them to a utility of −W, while the best action choice, c, has an expected utility of −1.
Further considerations
To distinguish which method we should be using, issues like stability, self-consistency, and other properties will probably be needed.