Maaay-be, but I’m not at all convinced this is the right way to think about this. Suppose that we can become consistent, and suppose that there is some program P that could in principle be written down that simulates a humanity that has become consistent, and human mathematicians are going through all possible theorems in lexicographical order, and someone is writing down in a simple format every theorem humanity has found to be true—so simple a format that given the output of program P, you can easily find in it the list of theorems found so far. Thus, there is a relatively simple recursive enumeration of all theorems this version of humanity will ever find—just run P and output a theorem as soon as it appears on the list.
Now suppose humanity does figure out that it’s living in a simulation, and figures out the source code of P. Then it knows its own Gödel sentence. Will this lead to a logical contradiction?
Well, if humanity does write down “our Gödel sentence is true” on the List, then that sentence will be false (because it says something equivalent to “this sentence does not appear on the List”). That actually doesn’t make humanity inconsistent, but it makes it wrong (unsound), which we’ll take to be just as bad (and we’ll get inconsistent if we also write down all logical consequences of this statement + the Peano axioms). So what if humanity does not write it down on the list? Then the statement is true, and it’s obvious to us that it’s true. Not a logical contradiction—but it does mean that humanity recognizes some things as true it isn’t writing down on its list.
Paul Christiano has written someposts related to this (which he’s kindly pointed out to me when I was talking about some related things with him :-)). One way I think about this is by the following thought experiment: Suppose you’re taking part in an experiment supposedly testing your ability to understand simple electric circuits. You’re put in a room with two buttons, red and green, and above them two LEDs, red and green. When you press one of the buttons, one of the lights will light up; your task is to predict which light will light up and press the button of the same color. The circuit is wired as follows: Pressing the red button makes the green light light up; pressing the green button makes the red light light up. I maintain that you can be able to understand perfectly how the mechanism works, and yet not be able to “prove” this by completing the task.
So, I’d say that “what humanity wrote on its list” isn’t a correct formalization of “what humanity understands to be true”. Now, it does seem possible that there still is a formal predicate that captures the meaning of “humanity understands proposition A to be true”. If so, I’d expect the problem with recognizing it not to be that it talks about something like large ordinals, but that it talks about the individual neurons in every human mind, and is far too long for humans to process sensibly. (We might be able to recognize that it says lots of things about human neurons, but not that the exact thing it says is “proposition A will at some point be understood by humanity to be true”.) If you try to include “human” minds scaled to arbitrarily large sizes (like 3^^^3 and so on), so that any finite-length statement can be comprehended by some large-enough mind, my guess would be that there is no finite-length predicate that can accurately reflect “mind M understands proposition A to be true” for arbitrarily large “human” minds M, though I won’t claim strong confidence in this guess.
You shouldn’t include things we know only by experience as part of our theoretical system, for the purpose of “the human Godel sentence.” At best learning a theorem from experience would add an axiom, but then our Godel sentence changes. So if we knew our Godel sentence it would become something else.
I postulated that “given the output of program P, you can easily find in it the list of theorems found so far”—by which I meant that it’s easy to write a program that takes the output of P until step t, and returns everything written on the list up to time t (was the confusion that it wasn’t clear that this was what I meant?). If you also know the source of P, you have a program that for every t returns the list up to time t, so it’s easy to write down the predicate L(n) of PA that says “there is some time t such that the proposition with Gödel number n appears on the list at time t.” By the diagonal lemma, there is a sentence G such that
PA |- G <-> not L(the Gödel number of G).
G is humanity’s Gödel sentence, and there is no trouble in writing it down inside the simulation, if you know the source of P and the source of the program that reads the list from P’s output.
(Well, technically, G is one Gödel sentence, and there could be other ways to write it that are harder to recognize, but one recognizable Gödel sentence should be enough for the “no AI” proof to go through if it were a well-formed argument at all, and I don’t think Stuart’s claim is just that there are some obfuscated ways to write a Gödel sentence that are unrecognizable.)
Maaay-be, but I’m not at all convinced this is the right way to think about this. Suppose that we can become consistent, and suppose that there is some program P that could in principle be written down that simulates a humanity that has become consistent, and human mathematicians are going through all possible theorems in lexicographical order, and someone is writing down in a simple format every theorem humanity has found to be true—so simple a format that given the output of program P, you can easily find in it the list of theorems found so far. Thus, there is a relatively simple recursive enumeration of all theorems this version of humanity will ever find—just run P and output a theorem as soon as it appears on the list.
Now suppose humanity does figure out that it’s living in a simulation, and figures out the source code of P. Then it knows its own Gödel sentence. Will this lead to a logical contradiction?
Well, if humanity does write down “our Gödel sentence is true” on the List, then that sentence will be false (because it says something equivalent to “this sentence does not appear on the List”). That actually doesn’t make humanity inconsistent, but it makes it wrong (unsound), which we’ll take to be just as bad (and we’ll get inconsistent if we also write down all logical consequences of this statement + the Peano axioms). So what if humanity does not write it down on the list? Then the statement is true, and it’s obvious to us that it’s true. Not a logical contradiction—but it does mean that humanity recognizes some things as true it isn’t writing down on its list.
Paul Christiano has written some posts related to this (which he’s kindly pointed out to me when I was talking about some related things with him :-)). One way I think about this is by the following thought experiment: Suppose you’re taking part in an experiment supposedly testing your ability to understand simple electric circuits. You’re put in a room with two buttons, red and green, and above them two LEDs, red and green. When you press one of the buttons, one of the lights will light up; your task is to predict which light will light up and press the button of the same color. The circuit is wired as follows: Pressing the red button makes the green light light up; pressing the green button makes the red light light up. I maintain that you can be able to understand perfectly how the mechanism works, and yet not be able to “prove” this by completing the task.
So, I’d say that “what humanity wrote on its list” isn’t a correct formalization of “what humanity understands to be true”. Now, it does seem possible that there still is a formal predicate that captures the meaning of “humanity understands proposition A to be true”. If so, I’d expect the problem with recognizing it not to be that it talks about something like large ordinals, but that it talks about the individual neurons in every human mind, and is far too long for humans to process sensibly. (We might be able to recognize that it says lots of things about human neurons, but not that the exact thing it says is “proposition A will at some point be understood by humanity to be true”.) If you try to include “human” minds scaled to arbitrarily large sizes (like 3^^^3 and so on), so that any finite-length statement can be comprehended by some large-enough mind, my guess would be that there is no finite-length predicate that can accurately reflect “mind M understands proposition A to be true” for arbitrarily large “human” minds M, though I won’t claim strong confidence in this guess.
You shouldn’t include things we know only by experience as part of our theoretical system, for the purpose of “the human Godel sentence.” At best learning a theorem from experience would add an axiom, but then our Godel sentence changes. So if we knew our Godel sentence it would become something else.
Wait, why does that follow?
I postulated that “given the output of program P, you can easily find in it the list of theorems found so far”—by which I meant that it’s easy to write a program that takes the output of P until step t, and returns everything written on the list up to time t (was the confusion that it wasn’t clear that this was what I meant?). If you also know the source of P, you have a program that for every t returns the list up to time t, so it’s easy to write down the predicate L(n) of PA that says “there is some time t such that the proposition with Gödel number n appears on the list at time t.” By the diagonal lemma, there is a sentence G such that
PA |- G <-> not L(the Gödel number of G).
G is humanity’s Gödel sentence, and there is no trouble in writing it down inside the simulation, if you know the source of P and the source of the program that reads the list from P’s output.
(Well, technically, G is one Gödel sentence, and there could be other ways to write it that are harder to recognize, but one recognizable Gödel sentence should be enough for the “no AI” proof to go through if it were a well-formed argument at all, and I don’t think Stuart’s claim is just that there are some obfuscated ways to write a Gödel sentence that are unrecognizable.)