Modal Chicken
I thought it might be good fun to try doing modal chicken. This is my first time getting into the dirty technical details of MIRI’s research, so do tolerate and point out misunderstandings.
Chicken is a game where two drivers drive towards each other on a collision course: one must swerve, or both die in the crash, but if one driver swerves and the other does not, the one who swerved will be called a “chicken,” and lose social status. Chicken differs from the Prisoner’s Dilemma in that the sucker’s payoff is preferable to the punishment payoff, as opposed to the converse. (If your opponent defects, it is more preferable to defect than to cooperate. If your opponent doesn’t swerve, it is more preferable to swerve than not to swerve.) That is, Chicken is an anti-coordination game: a game in which it is mutually beneficial for the players to play different strategies. We define the payoff matrix as follows:
In the game of Chicken, we want to define agents that always swerve against themselves, and that only swerve if their opponents don’t swerve against them. Let’s try defining some modal agents with these properties.
As usual, the agents herein are defined as modal formulas in Godel-Lob provability logic. In particular, our ″agents″ will be formulas in Peano Arithmetic, and our criterion for action will be the existence of a finite proof in the tower of formal systems PA+n, where PA is Peano Arithmetic, and PA+(n+1) is the formal system whose axioms are the axioms of PA+n, plus the axiom that PA+n is consistent.
Fix a particular Godel numbering scheme, and let and each denote well-formed formulas with one free variable. Then let denote the formula where we replace each instance of the free variable in with the Godel number of . If such a formula holds in the standard model of Peano Arithmetic, we interpret that as swerving against ; if its negation holds, we interpret that as not swerving against . In particular, we will prove theorems in PA+n to establish whether the agents we discuss swerve or don’t swerve against one another. Thus we can regard such formulas of arithmetic as decision-theoretic agents, and we will use ″source code″ to refer to their Godel numbers. and will be used interchangeably with and .
Some very simple agents are worth defining first, like SwerveBot, the modal agent that always swerves:
Definition 1 (SwerveBot).
And naturally, NerveBot, the modal agent that never swerves:
Theorem 1.1 is incorrect. The flaw in the proof is that CB(CB)=S does not imply that Provable[CB(CB)=S]. Also, the conclusion of the theorem must be incorrect because the claim would imply that the provability predicate is unsound.
AstuteBot is complicated enough that I can’t see what it’s supposed to be doing, but in your natural language description of AstuteBot, your parenthetical rephrasings say different things than the parts of the description they are intended to make precise.
I wonder if there is a way that you might go about defining modal agents that reason something like:
and, very roughly:
It seemed weird to me that there would be something wrong with defining something like CarefulBot because it seems pretty analogous to the FairBot of the modal Prisoner’s Dilemma.
You can’t define a modal agent that swerves if and only if the other player doesn’t swerve, because “the other player doesn’t swerve” is not a modalized formula.
CarefulBot does not seem much like FairBot to me. It also seems undeserving of its name, since it doesn’t swerve unless it can prove that the other player won’t. Perhaps instead you should make it so it swerves unless it can prove that the other player does?
Per AlexMennen, I’d rename CarefulBot-as-formulated to something like BoldBot, and apply the name CarefulBot to “swerve unless it’s proven the other bot swerves.”
I think (I’m an amateur here, full disclosure; could be mistakes) we can capture some of the behavior you’re interested in by considering bots with different levels of proof system. So e.g. CBn swerves unless it can prove, using PA+n, that the opponent swerves.
Then we see that for n > m, CBn(CBm) = D. Proof: Suppose CBn doesn’t swerve. In PA+n, it is an axiom that provability in PA+m implies truth. Therefore PA+m cannot prove that CBn swerves. In a contest between CarefulBots, the stronger proof system wins.
Now consider BoldBot. We can see that BBn does not swerve against BBn, because by symmetry both agents prove the same thing; and if they both proved their opponent does not swerve, they would both swerve, and both have proven a falsehood.
Analyzing BBn vs. BBm is proving difficult, but I’m going to try a bit more.
Actually, I notice that BBn vs. BBm is isomorphic to CBn vs. CBm! Just interchange ‘swerve’ and ‘don’t swerve’ in the specification of one to convert it into the other. This implies that BBn swerves against BBm, and BBm does not swerve, if my proof about CBn vs. CBm is valid. I’m no longer so sure it is...
I had never heard of the ‘chicken’ game before. It seems the punishment of ‘losing social status’ would apply to real-world prisoner’s dilemma-like games as well: The cheater gets called a ‘rat’ and suffers from loss of social status.
The only thing we’re really interested in are the payoff matrices, and the fact remains that where P is the punishment payoff, S is the sucker’s payoff, R is the reward payoff, and T is the temptation payoff, Chicken is the canonical game for games with a payoff ordering of P < S < R < T, and the Prisoner’s Dilemma is the canonical game for games with a payoff ordering of S < P < R < T.