So8res comments on Open technical problem: A Quinean proof of Löb’s theorem, for an easier cartoon guide

So8res 25 Nov 2022 16:12 UTC
LW: 32 AF: 16
3
AF
(Epistemic status: quickly-recounted lightly-edited cached state that I sent in response to an email thread on this topic, that I now notice had an associated public post. Sorry for the length; it was easier to just do a big unfiltered brain-dump than to cull, with footnotes added.)

here’s a few quick thoughts i have cached about the proof of löb’s theorem (and which i think imply that the suggested technique won’t work, and will be tricky to repair):

#1. löb’s theorem is essentially just the Y combinator, but with an extra level of quotation mixed in.^[1]
```
Y : (A → A) → A
Y f := let g be (λ s. f (s s)) in g g

löb : □ (□ A → A) → □ A
löb f := let g be (λ s. f (s "s")) in g "g"
```
the standard proof of löb’s theorem (e.g. as it appears in the cartoon guide) is complicated by two points:

1a. one thing we need here is that the variable s is sensical, and in particular we need some S with S ≃ (□ S → A). the existence of such an S follows from the diagonal lemma.^[2]

1b. the other thing we need here is to show that quotation and normal proof steps (function application or modus ponens or whatever) interact in the “obvious way”, and don’t gum each other up. for instance, we need that □(A → B) ∧ □(A) yields □(B), showing that we can “apply quoted functions to quoted arguments”, in order for syntax like f (s "s") to be sensical.

if i were trying to make a guide to löb’s theorem today, i’d break it down as above; for me, at least, this breakdown is significantly more intuitive than the cartoon guide (which, iirc, doesn’t even go into the diagonal lemma, and just blends the Y combinator together with 1b).

this suggests that you’re going to be hard-pressed to do any self-reference without routing through the nomal machinery of löb’s theorem, in the same way that it’s hard to do recursion in the lambda calculus without routing through the Y combinator—you don’t need to use Y per se, but you’re unlikely to find a simpler way to do generic recursion.

this also suggests that you’re hiding all the work done by the normal proof of löb’s theorem in the definition of “zip”, and that this proof won’t turn out to be simpler when “zip” is cashed out in full.

#2. fun fact: the diagonal lemma is also just an instance of the Y combinator with an extra level of quotation.

in fact, there’s a proof of the diagonal lemma that goes by analogy to the proof-sketch of löb’s theorem (above). very roughly speaking, the way that you avoid infinite regress is that, in the step 1a, you need an S with S ≃ □ S → Type, and this S can be written manually because Type (or Proposition or w/e you want to use) is very rich (namely, it contains existential quantifiers and equality).^[3]

indeed, you can think of the diagonal lemma as simply an application of löb’s theorem where the target A is Type (or Proposition or w/e).

in fact, this analogy is quite tight. one cool corollary is that we can pop open tarski’s undefinability of truth, and replace the diagonal lemma with löb’s theorem. the result is a new theorem that i call “the undefinability of normalization”: no theory can prove that its terms normalize, on pain of contradiction.

(proof, in case it’s not obvious: invoke löb’s theorem on (not ∘ normalize : □ Bool → Bool), now you have syntax for a boolean value that’s provably-equal to its own negation.)

#3. fun fact: löb’s theorem does not hold for various breeds of normalized proofs (such as proofs with cuts eliminated—though exactly what counts as “normalized” in PA is somewhat subtle). in particular, when □ denotes normalized proofs, □(A → B) ∧ □(A) does not yield □(B), because feeding a normalized argument into a normalized function does not yield a normalized output. and while we (who trust the theory) know that unnormalized-box(B) implies normalized-box(B), no consistent theory can prove its own version of that fact, by the undefinability of normalization!

...which yields the hypothesis that the One True Counterfactuals are those facts which have short normalized proofs.

(recall: i think it was patrick who first set forth the trolljecture that counterfactuals correspond to short proofs of material implications? and this trolljecture was shot down by the existence of spurious counterfactuals, which use löb to deduce arbitrary material implications with a ~constant proof-length. but that counterexample doesn’t apply to normalized proofs! so the repaired trolljecture yet stands!)

(though i admit this hasn’t risen high enough in my priority-queue for me to actually try to prove nice things about this bold new notion of counterfactuals. also they obviously can’t be the whole answer, b/c normalized proofs about large worlds are large, etc.)
1. ↩︎
  slighly more formally: both the Y combinator and löb’s theorem are instantiations of a generalization of lawvere’s fixpoint theorem to semicategories. in the case of the Y combinator, we use some domain (in the sense of domain theory) that serves as semantics for the untyped lambda calculus. in löb’s theorem, iirc we use a semicategory where a morphism from A to B is a proof of □ A → B (tho this is an old cached thought that i haven’t checked). that the latter is not a category follows from gödel incompleteness; there’s no identity morphism at ⊥.
  
  note: i dislike semicategories, and if you have a more elegant way to unify Y and löb, i’m eager to hear it.
2. ↩︎
  in other words: the santa clause sentence is just the natural/obvious type of the variable in the Y combinator (as modified to work with gödel-codes). if you want a better intuition for it, i recommend the exercise of trying to figure out what type the variable s should have in the normal Y combinator.
3. ↩︎
  in fact, according to me, this is one of three keys to löb’s theorem.
  
  the first two keys are ofc gödel-codes + the Y combinator, with the caveat that making the Y combinator behave requires the diagonal lemma.
  
  and the diagonal lemma is again just gödel-codes + the Y combinator, except that in this very specific case we can make the Y combinator behave without already needing the diagonal lemma, by hand-coding in a workaround using quantifiers and equalities and whatnot.
  
  and the hand-coded workaround is kinda hacky. very roughly speaking, the trouble is that we want to let G(s) be F (s "s"), but showing that s "s" is sensical relies on the diagonal lemma, which we’re in the middle of proving and can’t invoke.
  
  and the solution is to let G(s) instead be ∃ x. just x = try (s "s") ∧ F x. we don’t need to know that s "s" is sensical, because we’re naming a proposition, and we can just bolt on a clause asserting that s "s" is sensical! we just build our sentence to say ”s "s" is sensical, and also …”, and thus sidestep the problem!
  
  in other words: we’re able to build a sentence that would be meaningful (it would mean a falsehood) even if s "s" wasn’t sensical, and this lets us form a sort of “guarded” version of the Y combinator (in the special case where we’re producing a sentence) that doesn’t depend critically on the sensicality of s "s". (which ends up making s "s" be sensical in the actual use-case of G "G", and so the guard ends up redundant and harmless.)
  
  (exercise: why can’t you use this technique to avoid the invocation of the diagonal lemma from the proof of gödel’s incompleteness theorem, i.e. löb’s theorem where the target A is ⊥?)
  
  this technique lets us slip around the need to invoke the diagonal lemma before we’ve finished proving the diagonal lemma. which is a critical step in the bootstrapping process.
  
  ...i don’t have this point entirely distilled yet, but hopefully that gives at least some intuition for what i mean when i say that löb is just gödel-codes + the Y combinator + a hacky workaround to get you off the ground.
- So8res 25 Nov 2022 16:16 UTC
  LW: 13 AF: 7
  0
  AF Parent
  an attempt to rescue what seems to me like the intuition in the OP:
  
  (note that the result is underwhelming, but perhaps informative.)
  
  in the lambda calculus we might say “given (f : A → A), we can get an element (a : A) by the equation (a := f a)”.
  
  recursive definitions such as this one work perfectly well, modulo the caveat that you need to have developed the Y combinator first so that you can ground out this new recursive syntax (assuming you don’t want to add any new basic combinators).
  
  by a directly analogous argument, we might wish define (löb f := f “löb f”). very direct! very quine-like!
  
  and, this proof works perfectly well. …modulo the caveat that you need to have developed a modified Y combinator first, so that you can ground out this recursive syntax (assuming you don’t want to add any new axioms).
  
  so if you prove the existence of a modified Y-combinator (that works w/ quotations), then you can add recursive syntax (modulo quotation) to your theory, and get a very “direct/quine-like” proof of löb’s theorem.
  
  the only hiccup is that the modified-Y-combinator is itself löb’s theorem, so the above proof amounts to “once you have proven löb’s theorem, there is a very direct and intuitive proof of löb’s theorem”, which… is perhaps less exciting :-p
  
  (bonus fun fact: in type theory, there’s an operator that forms quines. and the proof of löb’s theorem is extremely simple, given this operator; you just take a (syntactic) term of type □ A → A, and feed it the quotation of the result of this whole operation, using the quining operator. the hiccup is, ofc, that the quining operator is itself just löb’s theorem!)
  - So8res 25 Nov 2022 16:23 UTC
    LW: 15 AF: 9
    0
    AF Parent
    various attempts to distill my objection:
    
    the details of the omitted “zip” operation are going to be no simpler than the standard proof of löb’s theorem, and will probably turn out to be just a variation on the standard proof of löb’s theorem (unless you can find a way of building self-reference that’s shorter than the Y combinator (b/c the standard proof of löb’s theorem is already just the Y combinator plus the minimal modifications to interplay with gödel-codes))
    
    even more compressed: the normal proof of löb is contained in the thing labeled “zip”. the argument in the OP is operating under the assumption that a proof can make reference to its own gödel-number, and that assumption is essentially just löb’s theorem.
    
    a thing that might help is asking what specific proof-steps the step “1. the gödel-code P of this very proof proves C” cashes out into. obviously step 1 can’t cash out into verifying (under quotation) every step of the entire proof (by checking for the appropriate prime-factors), because then the proof would require at least one step for every step in the proof plus three extra steps at the end, which would make the proof infinite (and infinite proofs admit contradictions).
    
    so step 1 has to involve some indirection somehow. (cf: we can’t get a fixpoint in λ-calculus by f (f (f ..., we need some indirection, as occurs in (Y f) aka (λ s. f (s s)) (λ s. f (s s)), to make the term finite.)
    
    this works fine, as long as you are ok w/ the gödel-code in step 1 being not P, but some different proof of the same conclusion (analogous to the difference between f (Y f) and Y f), and so long as you have the analog of the Y combinator (modified to interplay with gödel-codes).
    
    but the analog of the Y combinator is precisely löb’s theorem :-p
    
    if i understand your OP correctly, we can factor it into three claims:
    
    when proving C, it’s valid to assume a referece «this» to the gödel-code of a proof of C
    given (1) and a proof f of □ C → C, there’s a proof of C by simply applying f to «this»
    you can arrange things such that if you desugar «this» into a particular numeral in “f «this»”, you’ll find that that numeral is the gödel-code of “f «this»”.
    
    (1) is löb’s theorem.
    
    (2) is an indirect proof of löb’s theorem, from the assumption of löb’s theorem (in direct analogy with how, once we’ve found syntax Y f for the fixpoint of f, we can wrap it in f to find an alternate syntax, namely f (Y f)).
    
    (3) is false (i’m fairly confident)--you can get real close, but you can’t quite get there (in direct analogy to how you can get λ-term Y f that means the same thing as f (Y f), but you can’t get a λ-term fix f that is syntactically exactly f (fix f), on pain of infinite syntax).
    
    afaict, there’s two ways to read this proposal (if we port across the analogy to λ-calculus), according to whether you cash “zip” out to a recursive instance of the whole proposal, vs whether you cash “zip” out to some different proof of löb’s theorem:
    
    “i hear that people find the Y combinator confusing. instead of using (Y f) to name the fixpoint of f, why not use f (f (f (f (... ad infinitum?”
    
    “i hear that people find the Y combinator confusing. instead of using (Y f) to name the fixpoint of f, why not use f (Y f)?”
    
    the answer to (1) is that it doesn’t make sense unless you allow infinite syntax trees (which aren’t permitted in normal math b/c infinite proofs admit contradictions).
    
    the answer to (2) is that, while (Y f) may be somewhat tricky to grasp, f (Y f) is surely no simpler :-p
    What links here?
    So8res's comment on Open technical problem: A Quinean proof of Löb’s theorem, for an easier cartoon guide by Andrew_Critch (26 Nov 2022 5:24 UTC; 2 points)
    - Amalthea 28 Nov 2022 20:43 UTC
      1 point
      0
      Parent
      I was interpreting things somewhat differently:
      “this” should be a reference to the Gödel code of the current proof
      the OP would like to have what is essentially an application of the diagonal lemma to the whole proof instead of a proposition
      “zip”/”unzip” is simply taking the Gödel numbering, and doesn’t hide any extra complexity
      In this case the problem here is making precise what one hopes for in step 2. (I don’t really see how one would do it), and then to prove that one can make it work.
      - So8res 28 Nov 2022 21:41 UTC
        3 points
        0
        Parent
        part of my point is that the standard proof of löb does essentially apply the main idea of the diagonal lemma to the whole proof instead of to a proposition. when you look at the standard proof of löb from the right angle, it looks very similar to the standard standard proof of the diagonal lemma (namely, they both look like lawvere’s fixpoint theorem, as does the Y combinator).
        
        löb lets you define a proof by reference to its own gödel-code, just like how the diagonal lemma lets you define a proposition by reference to its own gödel-code. (the former just happens to recurse into the latter at one point along the way, as an implementation detail.)
        
        (“but wait, if the mechanisms are so similar, then why doesn’t the diagonal lemma have to recurse into the diagonal lemma at one point along the way, as an implementation detail?” it almost does! and this almost sinks the ship! the diagonal lemma requires an extra hacky insight, that works only in the special case of the diagonal lemma, and that is required to get off the ground. i discuss this in footnote 2 of my original comment, iirc.)
        
        another part of my point is that the hard part of zip/unzip isn’t the part where you take the gödel-numbering, it’s the part where you make the gödel-numbering of the proof you’re currently naming available to the proof that you haven’t finished defining yet. that’s the tricky part of löb’s theorem. if we ignore that part, it’s no surprise that we find the remainder is simpler: we skipped the hard part!
        So8res 28 Nov 2022 21:50 UTC
        2 points
        0
        Parent
        (another thing that might help: when you’re proving an implication □ C → C, the gödel-number that you’re given doesn’t code for the proof of the implication you’re currently writing; that would be ill-typed. you asked for a □ C, not a □ (□ C → C). so the gödel-number you’re given isn’t a code for the thing you’re currently writing, it’s a code for löb’s theorem applied to the thing you’re currently writing.
        
        it is for this reason that the proof you’re fed might not be exactly the proof you were hoping for. you started out your implication being like “step 1: this proof is a proof of C”, but then the proof that you’re fed starts out saying “step 1: apply löb’s theorem to the following implication”, and only at step 2 is it like “this proof is a proof of C”.
        
        you might have really wanted the proof you’re fed to start with “step 1: this proof is a proof of C”. and you might say “well, inline the application of löb’s theorem there, so that the code i’m fed codes for a proof that actually starts that way”. this is like trying to get the fixpoint of f in the λ-calculus by way of f (f (f (...: you can do that for as many steps as you like, but you have to end at (Y f) at some point, on pain of infinite terms (which are contradictory in the mathematical setting).
        
        alternatively, you might accept that the proof you’re fed codes for a proof that starts by saying “apply löb’s theorem to the following implication: …”; this corresponds to being fed the term (Y f) and producing the term f (Y f).
        
        in neither case does it seem to me like there’s an idea for some alternative method of producing the self-reference, so as to sidestep the Y-combinator-esque machinery that makes up the bulk of the proof of löb’s theorem (and that is the source of the recursive call into the diagonal lemma).)
- Andrew_Critch 26 Nov 2022 4:31 UTC
  LW: 5 AF: 2
  0
  AF Parent
  this suggests that you’re going to be hard-pressed to do any self-reference without routing through the nomal machinery of löb’s theorem, in the same way that it’s hard to do recursion in the lambda calculus without routing through the Y combinator
  If by “the normal machinery”, you mean a clever application of the diagonal lemma, then I agree. But I think we can get away with not having the self-referential sentence, by using the same y-combinator-like diagonal-lemma machinery to make a proof that refers to itself (instead of a proof about sentences that refer to themselves) and checks its own validity. I think I if or someone else produces a valid proof like that, skeptics of its value (of which you might be one; I’m not sure) will look at it and say “That was harder and less efficient than the usual way of proving Löb using the self-referential sentence $Ψ$ and no self-validation”. I predict I’ll agree with that, and still find the new proof to be of additional intellectual value, for the following reason:
  - Human documents tend to refer to the themselves a lot, like bylaws.
  - Human sentences, on the other hand, rarely refer to themselves. (This sentence is an exception, but there aren’t a lot of naturally occurring examples.)
  - Therefore, a proof of Löb whose main use of self-references involves the entire proof referring to itself, rather than a sing sentence referring to itself, will be more intuitive to humans (such as lawyers) who are used to thinking about self-referential documents.
  The skeptic response to that will be to say that those peoples’ intuitions are the wrong way to think about y-combinator manipulation, and to that I’ll be like “Maybe, but I’m not super convinced their perspective is wrong, and in any case I don’t mind meeting them where they’re at, using a proof that they find more intuitive.”.
  Summary: I’m pretty confident the proof will be valuable, even though I agree it will have to use much of the same machinery as the usual proof, plus some extra machinery for helping the proof to self-validate, as long as the proof doesn’t use sentences that are basically only about their own meaning (the way the sentence $Ψ$ is basically only about its own relationship to the sentence C, which is weird).
  - Ben Pace 26 Nov 2022 4:37 UTC
    LW: 12 AF: 8
    0
    AF Parent
    This sentence is an exception, but there aren’t a lot of naturally occurring examples.
    No strong claim either way, but as a datapoint I do somewhat often use the phrase “I hereby invite you to <event>” or “I hereby <request> something of you” to help move from ‘describing the world’ to ‘issuing an invitation/command/etc’.
    What links here?
    A Löbian argument pattern for implicit reasoning in natural language: Löbian party invitations by Andrew_Critch (1 Jan 2023 17:39 UTC; 23 points)
    Andrew_Critch's comment on A Löbian argument pattern for implicit reasoning in natural language: Löbian party invitations by Andrew_Critch (1 Jan 2023 18:01 UTC; 2 points)
    - Andrew_Critch 27 Nov 2022 2:02 UTC
      LW: 2 AF: 1
      0
      AF Parent
      True! “Hereby” covers a solid contingent of self-referential sentences. I wonder if there’s a “hereby” construction that would make the self-referential sentence Ψ (from the Wikipedia poof) more common-sense-meaningful to, say, lawyers.
  - So8res 26 Nov 2022 5:24 UTC
    LW: 2 AF: 1
    0
    AF Parent
    which self-referential sentence are you trying to avoid?
    
    it keeps sounding to me like you’re saying “i want a λ-calculus combinator that produces the fixpoint of a given function f, but i don’t want to use the Y combinator”.
    
    do you deny the alleged analogy between the normal proof of löb and the Y combinator? (hypothesis: maybe you see that the diagonal lemma is just the type-level Y combinator, but have not yet noticed that löb’s theorem is the corresponding term-level Y combinator?)
    
    if you follow the analogy, can you tell me what λ-term should come out when i put in f, and how it’s better than (λ s. f (s s)) (λ s. f (s s))?
    
    or (still assuming you follow the analogy): what sort of λ-term representing the fixpoint of f would constitute “referring to itself (instead of being a term about types that refer to themselves)”? in what sense is the term (λ s. f (s s)) (λ s. f (s s)) failing to “refer to itself”, and what property are you hoping for instead?
    
    (in case it helps with communication: when i try myself to answer these questions while staring at the OP, my best guess is that you’re asking “instead of the Y combinator, can we get a combinator that goes like f ↦ f ????”, and the two obvious ways to fill in the blanks are f ↦ f (Y f) and f ↦ f (f (f (.... i discussed why both of those are troublesome here, but am open to the possibility that i have not successfully understood what sort of fixpoint combinator you desire.)
    
    (ETA: also, ftr, in the proof-sketch of löb’s theorem that i gave above, the term "g "g"" occurs as a subterm if you do enough substitution, and it refers to the whole proof of löb’s theorem. just like how, in the version of the Y combinator given above, the term g g occurs as a subterm if you do enough β-reduction, and it refers to the whole fixpoint. which i note b/c it seems to me that you might have misunderstood a separate point about where the OP struggles as implying that the normal proof isn’t self-referring.)
    
    ((the OP is gonna struggle insofar as it does the analog of asking for a λ-term x that is literally syntactically identical to f x, for which the only solution is f (f (f (... which is problematic. but a λ-term y that β-reduces pretty quickly to f y is easy, and it’s what the standard constuction produces.))
    - Andrew_Critch 27 Nov 2022 2:22 UTC
      LW: 3 AF: 1
      0
      AF Parent
      At this point I’m more interested in hashing out approaches that might actually conform to the motivation in the OP. Perhaps I’ll come back to this discussion with you after I’ve spent a lot more time in a mode of searching for a positive result that fits with my motivation here. Meanwhile, thanks for thinking this over for a bit.
      - So8res 27 Nov 2022 2:57 UTC
        LW: 4 AF: 2
        1
        AF Parent
        well, in your search for that positive result, i recommend spending some time searching for a critch!simplified alternative to the Y combinator :-p.
        
        not every method of attaining self-reference in the λ-calculus will port over to logic (b/c in the logical setting lots of things need to be quoted), but the quotation sure isn’t making the problem any easier. a solution to the OP would yield a novel self-reference combinator in the λ-calculus, and the latter might be easier to find (b/c you don’t need to juggle quotes).
        
        if you can lay bare the self-referential property that you’re hoping for in the easier setting of λ-calculus, then perhaps others will have an easier time understanding what you want and helping out (and/or you’ll have an easier time noticing why your desires are unsatisfiable).
        
        (and if it’s still not clear that löb’s theorem is tightly connected to the Y combinator, such that any solution to the OP would immediately yield a critch!simplified self-reference combinator in the λ-calculus, then I recommend spending a little time studying the connection between the Y combinator, löb’s theorem, and lawvere’s fixpoint theorem.)