Modal Chicken

I thought it might be good fun to try doing modal chicken. This is my first time getting into the dirty technical details of MIRI’s research, so do tolerate and point out misunderstandings.

Chicken is a game where two drivers drive towards each other on a collision course: one must swerve, or both die in the crash, but if one driver swerves and the other does not, the one who swerved will be called a “chicken,” and lose social status. Chicken differs from the Prisoner’s Dilemma in that the sucker’s payoff is preferable to the punishment payoff, as opposed to the converse. (If your opponent defects, it is more preferable to defect than to cooperate. If your opponent doesn’t swerve, it is more preferable to swerve than not to swerve.) That is, Chicken is an anti-coordination game: a game in which it is mutually beneficial for the players to play different strategies. We define the payoff matrix as follows:

In the game of Chicken, we want to define agents that always swerve against themselves, and that only swerve if their opponents don’t swerve against them. Let’s try defining some modal agents with these properties.

As usual, the agents herein are defined as modal formulas in Godel-Lob provability logic. In particular, our ″agents″ will be formulas in Peano Arithmetic, and our criterion for action will be the existence of a finite proof in the tower of formal systems PA+n, where PA is Peano Arithmetic, and PA+(n+1) is the formal system whose axioms are the axioms of PA+n, plus the axiom that PA+n is consistent.

Fix a particular Godel numbering scheme, and let and each denote well-formed formulas with one free variable. Then let denote the formula where we replace each instance of the free variable in with the Godel number of . If such a formula holds in the standard model of Peano Arithmetic, we interpret that as swerving against ; if its negation holds, we interpret that as not swerving against . In particular, we will prove theorems in PA+n to establish whether the agents we discuss swerve or don’t swerve against one another. Thus we can regard such formulas of arithmetic as decision-theoretic agents, and we will use ″source code″ to refer to their Godel numbers. and will be used interchangeably with and .

Some very simple agents are worth defining first, like SwerveBot, the modal agent that always swerves:

Definition 1 (SwerveBot).

And naturally, NerveBot, the modal agent that never swerves:

Definition 2 (NerveBot).
An intuitive sort of decision-making process to formalize might be something like “I swerve if and only if my opponent doesn’t swerve.” I’ll call that agent CarefulBot, and we might define it like this:
Definition 3 (CarefulBot).
It follows immediately that CarefulBot doesn’t swerve against SwerveBot and that it swerves against NerveBot. CarefulBot also swerves against itself. (Which seems desirable.)
Theorem 1.1.
Proof. By contradiction, assume . Then and we may derive .
But it seems like we can do better with an agent that reasons as follows: “If I can prove (if PA+2 entails) the statement “if I don’t swerve (if PA entails ), then they can prove that I don’t swerve (then PA+1 entails , and if they prove that I don’t swerve (if PA+1 entails ), then they swerve (then PA entails )”, then I don’t swerve (then PA entails . Otherwise, I swerve (PA entails ). I’ll call this agent AstuteBot, and we might define it as follows:
Definition 4. (AstuteBot).
If I didn’t mess up, then AstuteBot swerves against itself, swerves against NerveBot, and doesn’t swerve against SwerveBot or CarefulBot; and CarefulBot swerves against AstuteBot.
Theorem 1.2.
Proof. By contradiction, assume . Then and we may derive .
Theorem 1.3.
Proof. PA+1 , and PA+1 . Thus, .
Corollary 1.3.1.
Proof. The proof is immediate.
The dynamic between CarefulBot and AstuteBot seems a lot like the dynamic between the predictor and the agent in Agent-Simulates-Predictor problems.
Or, everything I have just written is nonsense.