James Payor comments on Modal Fixpoint Cooperation without Löb’s Theorem

James Payor 11 Feb 2023 3:29 UTC
8 points
7
Intuitively, each agent in the group verifies that “if there is some argument compelling to everyone that we all cooperate, then I think everyone cooperates”.

But implicit in the math is “if I can prove that there is no such argument, then the hypothetical is moot, and I cooperate anyway”. !!

Is this a feature or a bug? Well I think this is part of how the group is ruling out a consistent world in which $\neg E$ .

EDIT: Okay so Lob’s theorem says you can never prove in the theory that there are no proofs of $X$ , due to the technicality in which $□ X$ means “a proof of any size”. See the followup comments.

Alas I don’t understand the $□ (□ E \to E)$ version well enough to give a compelling account of how it works.
- Daniel Kokotajlo 14 Jun 2023 23:21 UTC
  4 points
  0
  Parent
  So… it’s part of the setup that all of these agents will:
  --Cooperate if they can prove that there is some argument compelling to everyone that everyone cooperates (because then they prove that everyone cooperates, and that includes them, and their proof system isn’t mistaken?)
  --Cooperate if they can prove that there is no such argument.
  --Else defect.
  
  Am I getting that right?
  - James Payor 15 Jun 2023 7:34 UTC
    5 points
    0
    Parent
    For the setup $□ (□ E \to E) \to E$ , it’s bit more like: each member cooperates if they can prove that a compelling argument for “everyone cooperates” is sufficient to ensure “everyone cooperates”.
    Your second line seems right though! If there were provably no argument for straight up “everyone cooperates”, i.e. $□ (□ E \to ⊥)$ , this implies $□ (□ E \to E)$ and therefore $E$ , a contradiction.
    --
    Also I think I’m a bit less confused here these days, and in case it helps:
    Don’t forget that “ $□ P$ ” means “a proof of any size of $P$ ”, which is kinda crazy, and can be responsible for things not lining up with your intuition. My hot take is that Lob’s theorem / incompleteness says “with finite proof strength you can only deny proofs up to a limited size, on pain of diagonalization”. Which is way saner than the usual interpretation!
    So idk, especially in this context I think it’s a bad idea to throw out your intuition when the math seems to say something else. Since the mismatch is probably coming down to some subtlety in this formalization of provability/meta-methamatics. And I presently think the quirky nature of provability logic is often bugs due to bad choices in the formalism.
    - Daniel Kokotajlo 15 Jun 2023 16:11 UTC
      5 points
      0
      Parent
      Thanks for the reply. I’m a bit over my head here but isn’t this a problem for the practicality of this approach? We only get mutual cooperation because all of the agents have the very unusual property that they’ll cooperative if they find a proof that there is no such argument. Seems like a selfless and self-destructive property to have in most contexts, why would an agent self-modify into creating and maintaining this property?
      - James Payor 16 Jun 2023 0:09 UTC
        12 points
        2
        Parent
        (Thanks also to you for engaging!)
        Hm. I’m going to take a step back, away from the math, and see if that makes things less confusing.
        Let’s go back to Alice thinking about whether to cooperate with Bob. They both have perfect models of each other (perhaps in the form of source code).
        When Alice goes to think about what Bob will do, maybe she sees that Bob’s decision depends on what he thinks Alice will do.
        At this junction, I don’t want Alice to “recurse”, falling down the rabbit hole of “Alice thinking about Bob thinking about Alice thinking about—” and etc.
        Instead Alice should realize that she has a choice to make, about who she cooperates with, which will determine the answers Bob finds when thinking about her.
        This manouvre is doing a kind of causal surgery / counterfactual-taking. It cuts the loop by identifying “what Bob thinks about Alice” as a node under Alice’s control. This is the heart of it, and imo doesn’t rely on anything weird or unusual.
        James Payor 16 Jun 2023 0:30 UTC
        13 points
        2
        Parent
        I tried to formalize this, using $A \to B$ as a “poor man’s counterfactual”, standing in for “if Alice cooperates then so does Bob”. This has the odd behaviour of becoming “true” when Alice defects! You can see this as the counterfactual collapsing and becoming inconsistent, because its premise is violated. But this does mean we need to be careful about using these.
        For technical reasons we upgrade to $□ A \to B$ , which says “if Alice cooperates in a legible way, then Bob cooperates back”. Alice tries to prove this, and legibly cooperates if so.
        This setup gives us “Alice legibly cooperates if she can prove that, if she legibly cooperates, Bob would cooperate back”. In symbols, $□ (□ A \to B) \to A$ .
        Now, is this okay? What about proving $□ A \to ⊥$ ?
        Well, actually you can’t ever prove that! Because of Lob’s theorem.
        Outside the system we can definitely see cases where $A$ is unprovable, e.g. because Bob always defects. But you can’t prove this inside the system. You can only prove things like “ $□_{k} A \to ⊥$ ” for finite proof lengths $k$ .
        I think this is best seen as a consequence of “with finite proof strength you can only deny proofs up to a limited size”.
        So this construction works out, perhaps just because two different weirdnesses are canceling each other out. But in any case I think the underlying idea, “cooperate if choosing to do so leads to a good outcome”, is pretty trustworthy. It perhaps deserves to be cached out in better provability math.