Löb’s theorem is pretty counterintuitive. Speaking informally about one or more agents engaging in logically valid reasoning, the theorem says something like this:
“If it’s believed that believing a particular prophesy would cause it to self-fulfill, then the prophecy will be believed.”
Formally, the theorem just says □(□C→C)→□C , where □X means ”X is provable”. This is most weird if X is a false statement, because it means you can’t prove that you can’t prove X (Gödel’s theorem).
(Also: the box symbol above is not an unrendered symbol; it’s supposed to be a box.)
Meta-motivation
I’d like to make Löb’s theorem more intuitive for humans, because it shows how agents can sidestep the need to mentally simulate each other in games, by instead using reasoning/arguments that happen to self-fulfill in a good and defensible way. Basically, Löbian reflection helps agents to avoid metacognitive stack overflows when thinking about themselves and each other, and I want more human beings to understand how that can work, and sometimes already does work, in real world agents.
State of the art
The best attempt I’ve seen to make Löb’s theorem more intuitive is Eliezer Yudkowsky’s Cartoon Guide to Löb’s Theorem, which is still quite confusing. The confusingness comes from thinking about a self-referential statement (Ψ on Wikipedia; S on Arbital) that’s used to carry out the proof. The statement basically says “If this statement is provable, then C.” Dealing with that sentence is pretty cumbersome, and requires a lot of fiddling around with nested statements and implications.
Doing better
I think we can make a new proof of Löb’s theorem that doesn’t use that weird self-referential sentence, by instead making the proof of Löb’s theorem itselfself-referential. Page 15 of the following paper poses an open problem on how to do this, which I think is possible to resolve affirmatively:
If we can make a proof like that work, we could then use the following much shorter and simpler cartoon guide to explain it:
In other words, I want to write a proof of Löb’s theorem that is structured much like a Quine. I’d like the details to (eventually) be really crisp and readable, so it can be peer reviewed for correctness, at a level of rigor comparable to this earlier paper on Löb’s theorem for proof systems with bounded length:
As a possible start to writing such a proof, I think some of the same machinery (e.g., the diagonal lemma) from Boolos’s textbook “The Logic of Provability” can be used to cook up self-referential proofs fitting the cartoon template above… thereby making Löb’s theorem less mysterious and more intuitive forever.
Further meta-motivation (added Nov 26)
A key reason I’m interested in having a self-referential self-validating proof, rather than a normal-ish proof about a sentence that is self-referential (like Ψ on Wikipedia), is that human documents often refer to themselves, but human sentences rarely refer directly to themselves in isolation. This sentence is an exception, but such utterances are generally rare. So, making Löb more intuitive to humans either means
making humans more accustomed to thinking about sentences that refer to themselves (so the traditional proof of Löb can be more intuitive), or
finding a new proof that self-references in a way that’s more like the way a human document refers to itself.
The aim of this post is to try for the second approach. Notice how you didn’t think the previous sentence was weirdly self-referential? That’s because it referred to the whole document that it’s part of, rather than just narrowly referring directly to itself. I’m not quite sure why humans are more comfortable with that than with directly-and-narrowly self-referential sentences. Maybe it’s because I thing needs to have enough non-self-referential content before humans will care about it enough to tolerate it talking about itself. I’m not sure.
Countering naive impossibility arguments (added Nov 26)
It’s in fact possible to use Löb’s theorem to construct a self-referential proof of the kind illustrated above. So, naive arguments that “Proofs can’t be self-referential like that” are not going to hold up. The hard thing I’m asking for is to make such a proof — of Löb’s theorem specifically — without going through the traditional proof of Löb, i.e., without using a narrowly-and-directly-self-referential sentence (like Ψ on Wikipedia).
Difficulty assessment
I think getting a Qunean proof to work here is probably a bit easier than the JSL paper linked directly above. Like, I think if I spent 3 months on it (which is how long the JSL paper took), then I could do it. I do think getting it right involves mentally syncing with some fundamental and standard machinery in provability logic, which can take a bit of time if you’ve never worked in the area before (as I hadn’t).
Anyway, if you can figure out how to write this proof, please post your solution or solution attempts here :)
Open technical problem: A Quinean proof of Löb’s theorem, for an easier cartoon guide
Motivation
Löb’s theorem is pretty counterintuitive. Speaking informally about one or more agents engaging in logically valid reasoning, the theorem says something like this:
“If it’s believed that believing a particular prophesy would cause it to self-fulfill, then the prophecy will be believed.”
Formally, the theorem just says □(□C→C)→□C , where □X means ”X is provable”. This is most weird if X is a false statement, because it means you can’t prove that you can’t prove X (Gödel’s theorem).
(Also: the box symbol above is not an unrendered symbol; it’s supposed to be a box.)
Meta-motivation
I’d like to make Löb’s theorem more intuitive for humans, because it shows how agents can sidestep the need to mentally simulate each other in games, by instead using reasoning/arguments that happen to self-fulfill in a good and defensible way. Basically, Löbian reflection helps agents to avoid metacognitive stack overflows when thinking about themselves and each other, and I want more human beings to understand how that can work, and sometimes already does work, in real world agents.
State of the art
The best attempt I’ve seen to make Löb’s theorem more intuitive is Eliezer Yudkowsky’s Cartoon Guide to Löb’s Theorem, which is still quite confusing. The confusingness comes from thinking about a self-referential statement (Ψ on Wikipedia; S on Arbital) that’s used to carry out the proof. The statement basically says “If this statement is provable, then C.” Dealing with that sentence is pretty cumbersome, and requires a lot of fiddling around with nested statements and implications.
Doing better
I think we can make a new proof of Löb’s theorem that doesn’t use that weird self-referential sentence, by instead making the proof of Löb’s theorem itself self-referential. Page 15 of the following paper poses an open problem on how to do this, which I think is possible to resolve affirmatively:
Cooperative and uncooperative institution designs: Surprises and problems in open-source game theory (Critch, Dennis, Russell, 2022)
Here’s a screenshot from it:
If we can make a proof like that work, we could then use the following much shorter and simpler cartoon guide to explain it:
In other words, I want to write a proof of Löb’s theorem that is structured much like a Quine. I’d like the details to (eventually) be really crisp and readable, so it can be peer reviewed for correctness, at a level of rigor comparable to this earlier paper on Löb’s theorem for proof systems with bounded length:
A parametric, resource-bounded generalization of Löb’s theorem, and a robust cooperation criterion for open-source game theory. Journal of Symbolic Logic. (Critch, 2019)
As a possible start to writing such a proof, I think some of the same machinery (e.g., the diagonal lemma) from Boolos’s textbook “The Logic of Provability” can be used to cook up self-referential proofs fitting the cartoon template above… thereby making Löb’s theorem less mysterious and more intuitive forever.
Further meta-motivation (added Nov 26)
A key reason I’m interested in having a self-referential self-validating proof, rather than a normal-ish proof about a sentence that is self-referential (like Ψ on Wikipedia), is that human documents often refer to themselves, but human sentences rarely refer directly to themselves in isolation. This sentence is an exception, but such utterances are generally rare. So, making Löb more intuitive to humans either means
making humans more accustomed to thinking about sentences that refer to themselves (so the traditional proof of Löb can be more intuitive), or
finding a new proof that self-references in a way that’s more like the way a human document refers to itself.
The aim of this post is to try for the second approach. Notice how you didn’t think the previous sentence was weirdly self-referential? That’s because it referred to the whole document that it’s part of, rather than just narrowly referring directly to itself. I’m not quite sure why humans are more comfortable with that than with directly-and-narrowly self-referential sentences. Maybe it’s because I thing needs to have enough non-self-referential content before humans will care about it enough to tolerate it talking about itself. I’m not sure.
Countering naive impossibility arguments (added Nov 26)
It’s in fact possible to use Löb’s theorem to construct a self-referential proof of the kind illustrated above. So, naive arguments that “Proofs can’t be self-referential like that” are not going to hold up. The hard thing I’m asking for is to make such a proof — of Löb’s theorem specifically — without going through the traditional proof of Löb, i.e., without using a narrowly-and-directly-self-referential sentence (like Ψ on Wikipedia).
Difficulty assessment
I think getting a Qunean proof to work here is probably a bit easier than the JSL paper linked directly above. Like, I think if I spent 3 months on it (which is how long the JSL paper took), then I could do it. I do think getting it right involves mentally syncing with some fundamental and standard machinery in provability logic, which can take a bit of time if you’ve never worked in the area before (as I hadn’t).
Anyway, if you can figure out how to write this proof, please post your solution or solution attempts here :)