Modal Fixpoint Cooperation without Löb’s Theorem
Followed By: Payor’s Lemma in Natural Language
TL;DR: This post introduces a novel logical approach to achieving group-scale cooperation, based on modal fixpoint theory. This approach is both easier to understand and roughly 3x more efficient than previous approaches that factored through Löb’s Theorem, measured in terms of the length / complexity of the proofs involved.
The following theorem was inspired by Scott Garrabrant, and uses a lemma of James Payor in place of Löb’s Theorem to prove cooperation between a group of agents. I’ll state the theorem for three agents because that’s most illustrative of what’s going on:
Theorem: Suppose and are agents that return “true” to signify cooperation and “false” to signify defection. Let , so is the statement that “everyone cooperates”. Let , , and denote proof systems that extend Peano Arithmetic, let stand for , and suppose the agents behave in a manner satisfying the following conditions:
Then it follows that , i.e., everyone cooperates (provably!).
(Intuitively, the strategy of the agents in this theorem is to check that the group is trustworthy in a certain way before joining (cooperating with) the group. The theorem shows that the collective check on trustworthiness nests inside itself in a way that self-validates and yields cooperation.)
Proof:, by combining 1, 2, and 3 with .
, from 4 by the definition of E and .
, by the lemma below with .
[end proof]
The key to the theorem is the following lemma, due to James Payor:
Lemma: If then .
Historical note: Originally this lemma assumed , but then I realized only the forward implication is needed, so I rewrote it.
Proof: The proof uses the same modal rules of inference for as Löb’s theorem, namely, necessitation and distributivity:
, by tautology ().
, from 1 by necessitation and distributivity.
, by assumption.
, from 2 and 3 by modus ponens.
, from 4 by necessitation.
, from 5 and 3 by modus ponens.
[end proof]
Sweet! In comparison to Löb’s Theorem, two things are beautiful about Payor’s lemma:
It sidesteps the use of an auxiliary fixpoint , by examining a proposition of interest () that itself has the fixpoint structure needed to self-validate; and
It also allows the construction of unexploitable modal agents without Löb’s Theorem, as in the theorem above.
Discussion
In the proof of the Theorem, you might be wondering if it really makes sense to be thinking of as a logical system of its own. It doesn’t need to be, but the answer is yes if , , and are all finite extensions of PA. Then the axioms of are just [the conjunction of axioms of ][the conjunction of axioms of ] [the conjunction of axioms of ].
You also might wonder if an alternative approach to group cooperation might be to instead use the following strategies:
Then you’d be right! Here it also follows that . However, the proof involves a lot more nesting, with A thinking about what B’s thinking about what C’s thinking about (etc.), and it’s not as easy or short as the proof of the Theorem above.
Writing implications backwards
If you think of in Payor’s Lemma as “defining” as a program, it’s fun to write all the implication arrows from right to left, so they look a little like variable assignments. (And, under the Curry-Howard correspondence, they do actually correspond to maps.) Here’s the lemma and proof again, written that way:
Lemma: If then .
Proof: The same as above, but with the arrows written leftward!
, by tautology ().
, from 1 by necessitation and distributivity.
, by assumption.
, from 2 and 3 by modus ponens.
, from 4 by necessitation.
, from 5 and 3 by modus ponens.
[end proof]
I find the above somewhat more elegant than the version with rightward arrows, albeit a less standard way of writing.
Conclusion
In my opinion, what’s great about the lemma and theorem above is that they’re both relatively short and simple (relative to proving and using Löb’s Theorem), and they allow a proof of unexploitable group cooperation that’s roughly three times shorter than than one that starts by proving Löb’s Theorem (only ~6 lines of logic, vs ~18).
PS James says his next idea will be even better ;)
- Acausal normalcy by 3 Mar 2023 23:34 UTC; 193 points) (
- Self-Referential Probabilistic Logic Admits the Payor’s Lemma by 28 Nov 2023 10:27 UTC; 80 points) (
- Probabilistic Payor Lemma? by 19 Mar 2023 17:57 UTC; 69 points) (
- Payor’s Lemma in Natural Language by 2 Mar 2023 12:22 UTC; 62 points) (
- Some constructions for proof-based cooperation without Löb by 21 Mar 2023 16:12 UTC; 43 points) (
- Worrying less about acausal extortion by 23 May 2023 2:08 UTC; 41 points) (
- How could AIs ‘see’ each other’s source code? by 2 Jun 2023 22:41 UTC; 29 points) (
- Acausal normalcy by 3 Mar 2023 23:35 UTC; 21 points) (EA Forum;
- 12 Mar 2023 10:56 UTC; 3 points) 's comment on Acausal normalcy by (EA Forum;
- 25 Mar 2023 9:18 UTC; 3 points) 's comment on continue working on hard alignment! don’t give up! by (
- 11 Feb 2023 15:27 UTC; 3 points) 's comment on Threatening to do the impossible: A solution to spurious counterfactuals for functional decision theory via proof theory by (
- 29 Jun 2024 2:00 UTC; 2 points) 's comment on Richard Ngo’s Shortform by (
I continue to think there’s something important in here!
I haven’t had much success articulating why. I think it’s neat that the loop-breaking/choosing can be internalized, and not need to pass through Lob. And it informs my sense of how to distinguish real-world high-integrity vs low-integrity situations.
I’d be interested in a more in-depth review where you take another pass at this.