Modal Fixpoint Cooperation without Löb’s Theorem

Followed By: Payor’s Lemma in Natural Language

TL;DR: This post introduces a novel logical approach to achieving group-scale cooperation, based on modal fixpoint theory. This approach is both easier to understand and roughly 3x more efficient than previous approaches that factored through Löb’s Theorem, measured in terms of the length /​ complexity of the proofs involved.

The following theorem was inspired by Scott Garrabrant, and uses a lemma of James Payor in place of Löb’s Theorem to prove cooperation between a group of agents. I’ll state the theorem for three agents because that’s most illustrative of what’s going on:

Theorem: Suppose and are agents that return “true” to signify cooperation and “false” to signify defection. Let , so is the statement that “everyone cooperates”. Let , , and denote proof systems that extend Peano Arithmetic, let stand for , and suppose the agents behave in a manner satisfying the following conditions:



  1. Then it follows that , i.e., everyone cooperates (provably!).

    (Intuitively, the strategy of the agents in this theorem is to check that the group is trustworthy in a certain way before joining (cooperating with) the group. The theorem shows that the collective check on trustworthiness nests inside itself in a way that self-validates and yields cooperation.)

    Proof:

  2. , by combining 1, 2, and 3 with .

  3. , from 4 by the definition of E and .

  4. , by the lemma below with .

    [end proof]

The key to the theorem is the following lemma, due to James Payor:

Lemma: If then .

Historical note: Originally this lemma assumed , but then I realized only the forward implication is needed, so I rewrote it.

Proof: The proof uses the same modal rules of inference for as Löb’s theorem, namely, necessitation and distributivity:

  1. , by tautology ().

  2. , from 1 by necessitation and distributivity.

  3. , by assumption.

  4. , from 2 and 3 by modus ponens.

  5. , from 4 by necessitation.

  6. , from 5 and 3 by modus ponens.

    [end proof]

Sweet! In comparison to Löb’s Theorem, two things are beautiful about Payor’s lemma:

  • It sidesteps the use of an auxiliary fixpoint , by examining a proposition of interest () that itself has the fixpoint structure needed to self-validate; and

  • It also allows the construction of unexploitable modal agents without Löb’s Theorem, as in the theorem above.

Discussion

In the proof of the Theorem, you might be wondering if it really makes sense to be thinking of as a logical system of its own. It doesn’t need to be, but the answer is yes if , , and are all finite extensions of PA. Then the axioms of are just [the conjunction of axioms of ][the conjunction of axioms of ] [the conjunction of axioms of ].

You also might wonder if an alternative approach to group cooperation might be to instead use the following strategies:

Then you’d be right! Here it also follows that . However, the proof involves a lot more nesting, with A thinking about what B’s thinking about what C’s thinking about (etc.), and it’s not as easy or short as the proof of the Theorem above.

Writing implications backwards

If you think of in Payor’s Lemma as “defining” as a program, it’s fun to write all the implication arrows from right to left, so they look a little like variable assignments. (And, under the Curry-Howard correspondence, they do actually correspond to maps.) Here’s the lemma and proof again, written that way:

Lemma: If then .

Proof: The same as above, but with the arrows written leftward!

  1. , by tautology ().

  2. , from 1 by necessitation and distributivity.

  3. , by assumption.

  4. , from 2 and 3 by modus ponens.

  5. , from 4 by necessitation.

  6. , from 5 and 3 by modus ponens.

    [end proof]

I find the above somewhat more elegant than the version with rightward arrows, albeit a less standard way of writing.

Conclusion

In my opinion, what’s great about the lemma and theorem above is that they’re both relatively short and simple (relative to proving and using Löb’s Theorem), and they allow a proof of unexploitable group cooperation that’s roughly three times shorter than than one that starts by proving Löb’s Theorem (only ~6 lines of logic, vs ~18).

PS James says his next idea will be even better ;)