// Long comment, no new material, ends with an ill defined decision theory question. Feel free to collapse it and move on.
Player M is playing rounds of PD against increasingly interesting agents. Players swap source code and then return their decisions. The game payoff matrix takes sets of actions (consisting of “C”ooperates and “D”efects, one from each player), and returns payoffs for each player.
Each predicts M’s behavior from her source code and returns an action by its strategy. When M plays against Tantrum, she assumes Tantrum makes accurate predictions about her decision and reasons:
(M=D → R=D) → <1,1>
(M=C → R=D) → <0,3>
and so returns D, which gives her a higher payoff. M’s behavior in Round 2 can be summarized by the meta-strategy
and M decided by a strategy (Tantrum) that returned an M action for each possible Q action,
Q=(D,C)→M(D,D)
In round 2, the Rs had strategies, which returned an R action for each possible M action,
R=(Tantrum, Reverse, Copy, Saint), which was short-hand for
R=(M(D,C)→R(D,D), M(D,C)→R(C,D), M(D,C) → R(D,C), M(D,C) → R(C,C)),
and M decided by a meta-strategy that returned an M action for each possible R strategy.
R(Tantrum, Reverse, Copy, Saint) → M(D, D, C, D)
Taking this up one more level of meta, it seems round 3 should have M facing S-players who decide by meta strategies. The sixteen meta-strategies map M(Tantrum, Reverse, Copy, Saint) → onto S(D, D, D, D) through S(C, C, C, C). I’ll spare the enumeration.
But this is problematic: M tries to maximize its winnings from PD. PD takes the actions of M and S as input. S depends on M. So M is really maximizing PD(M,S(M)). But S-players select actions based on M’s strategy (cooperate if M plays by Copy or Tantrum, et cetera), not based on her action. M doesn’t decide by a strategy against S-players or R-players, she decides by a meta-meta-strategy and a meta-strategy, respectively. M decides by a normal strategy when faced with Q-agents.
So we could build a round 3 game where S simulates M in PDs with Qs, but this isn’t a fair game because M doesn’t know that the real payoff to maximize in the game with S, not the simulated game with Qs. Also, it’s not clear to me how M selects any action for the PD with S. Is there a well defined game which is “a level of meta up” from round 2 and fair to M? Preferably something not quite as hard as “M against all rational agents”.
// Long comment, no new material, ends with an ill defined decision theory question. Feel free to collapse it and move on.
Player M is playing rounds of PD against increasingly interesting agents. Players swap source code and then return their decisions. The game payoff matrix takes sets of actions (consisting of “C”ooperates and “D”efects, one from each player), and returns payoffs for each player.
is a set of actions, are the corresponding payoffs. M is always the first player.
Round 1: M plays two games against Q-type agents. The first Q, defect rock, always returns D, the second Q, coop rock, always returns C.
If M sees the defect rock, the game is limited to strategy sets where player 2 defects
If M sees the cooperate rock, the game becomes
M defects in both cases to maximize her payoff. Across all possible rocks, M plays the strategy
Round 1 can be summarized as
Round 2: M plays against R-type agents. The four types are R agents are
Each predicts M’s behavior from her source code and returns an action by its strategy. When M plays against Tantrum, she assumes Tantrum makes accurate predictions about her decision and reasons:
and so returns D, which gives her a higher payoff. M’s behavior in Round 2 can be summarized by the meta-strategy
Round 2 can be summarized as
In Round 1 M played “Tantrum”; Q(D,C) → M(D,D).
Round 3.
???
In round 1, the Qs each returned an action,
and M decided by a strategy (Tantrum) that returned an M action for each possible Q action,
In round 2, the Rs had strategies, which returned an R action for each possible M action,
and M decided by a meta-strategy that returned an M action for each possible R strategy.
Taking this up one more level of meta, it seems round 3 should have M facing S-players who decide by meta strategies. The sixteen meta-strategies map M(Tantrum, Reverse, Copy, Saint) → onto S(D, D, D, D) through S(C, C, C, C). I’ll spare the enumeration.
But this is problematic: M tries to maximize its winnings from PD. PD takes the actions of M and S as input. S depends on M. So M is really maximizing PD(M,S(M)). But S-players select actions based on M’s strategy (cooperate if M plays by Copy or Tantrum, et cetera), not based on her action. M doesn’t decide by a strategy against S-players or R-players, she decides by a meta-meta-strategy and a meta-strategy, respectively. M decides by a normal strategy when faced with Q-agents.
So we could build a round 3 game where S simulates M in PDs with Qs, but this isn’t a fair game because M doesn’t know that the real payoff to maximize in the game with S, not the simulated game with Qs. Also, it’s not clear to me how M selects any action for the PD with S. Is there a well defined game which is “a level of meta up” from round 2 and fair to M? Preferably something not quite as hard as “M against all rational agents”.