Consider the “self-hating theory” PA+Not(Con(PA)), or Peano Arithmetic plus the assertion of its own inconsistency. We know that if PA is consistent, then this strange theory must be consistent as well—since otherwise PA would prove its own consistency, which the Incompleteness Theorem doesn’t allow. It follows, by the Completeness Theorem, that PA+Not(Con(PA)) must have a model. But what could such a model possibly look like? In particular, what you happen if, within that model, you just asked to see the proof that PA was inconsistent?
I’ll tell you what would happen: the axioms would tell you that proof of PA’s inconsistency is encoded by a positive integer X. And then you would say, “but what is X?” And the axioms would say, “X.” And you would say, “But what is X, as an ordinary positive integer?”
“No, no, no! Talk to the axioms.”
“Alright, is X greater or less than 10500,000?”
“Greater.” (The axioms aren’t stupid: they know that if they said “smaller”, then you could simply try every smaller number and verify that none of them encode a proof of PA’s inconsistency.)
“Alright then, what’s X+1?”
“Y.”
And so on. The axioms will keep cooking up fictitious numbers to satisfy your requests, and assuming that PA itself is consistent, you’ll never be able to trap them in an inconsistency.
Taking this seriously, it seems that if PA is consistent, then PA + ¬Cons(PA) ⊢ ◻¬Cons(PA) is true, i.e. there exists a term, X, encoding a proof of ¬Cons(PA). It also seems, from “you’ll never be able to trap them”, that we can’t construct this proof. Which is to say, within some PA-like systems of interest, there are things (existentially) provable which are not (constructively) provable. So I agree that an AI should be able to accept statements provable in the second system before modifying to use that system. Our difference is between requiring existential proofs (which are not interesting to me, since things can be provable and yet never proved) and constructive proofs (a set whose membership is undecidable, yet which, by showing equivalence between the two systems or between the first and a subsystem of the second, we can accept). Again however, this is not a real limitation, because the constructive proof of a statement lets us infer that the statement is existentially provable. Thus we recover the “sensible” existentially provable statements, and discard the ones which are provable and yet unprovable.
I don’t think this is quite what you meant. This means PA + ¬Cons(PA) allows you to prove that you can prove ¬Cons(PA), where the second use of the word ‘prove’ does not make reference to any specific formal system. The turnstile refers to the specific theory that is being used, here PA + ¬Cons(PA), and the box symbol is used when the meaning is clear from context, but you’ve used both.
Our difference is between requiring existential proofs (which are not interesting to me, since things can be provable and yet never proved) and constructive proofs (a set whose membership is undecidable, yet which, by showing equivalence between the two systems or between the first and a subsystem of the second, we can accept). Again however, this is not a real limitation, because the constructive proof of a statement lets us infer that the statement is existentially provable. Thus we recover the “sensible” existentially provable statements, and discard the ones which are provable and yet unprovable.
This seems like a description of some constructive logic version of PA. I’m not sure exactly how that would work or if there is a standard constructive axiomatization of the natural numbers, but the set of theorems of constructive logic is a subset of those of classical logic, so this will not be able to prove that some given axiom system is trustable if the original system cannot do so.
You could add constructive theorems if you also add axioms, which I’m not sure whether you’re proposing. For example, given a universal Turing machine U and a number x, there is a shortest program p such that U(p) = x and it could be the case, depending on U and x, that PA can prove that such a p exists, but not what it is. Then, by adding axioms that U never halts on certain inputs, it can be possible to deduce the value of p.
I don’t see how that could help with the reflection problem though. Do you propose adding soundness of some formal system as an axiom, but preventing Loeb’s theorem from applying by switching to constructive logic? I think Loeb’s theorem is still valid constructively; it doesn’t use negation, but I’m not sure if that is a sufficient criterion to be provable constructively.
It’s called Heyting arithmetic, but no, nowhere above do I mean intuitionistic logic when I say constructive. Or I don’t think I do. I recognize Löb’s theorem and am not trying restrict any system’s expressive power to avoid it (i.e. using Dan Willard’s self verifying theories).
Maybe I’ll try to prove some sufficient conditions for a Gödel machine to self modify in a toy universe given this perspective. That should make it more convincing / concrete.
Scott Aaronson wrote
Taking this seriously, it seems that if PA is consistent, then PA + ¬Cons(PA) ⊢ ◻¬Cons(PA) is true, i.e. there exists a term, X, encoding a proof of ¬Cons(PA). It also seems, from “you’ll never be able to trap them”, that we can’t construct this proof. Which is to say, within some PA-like systems of interest, there are things (existentially) provable which are not (constructively) provable. So I agree that an AI should be able to accept statements provable in the second system before modifying to use that system. Our difference is between requiring existential proofs (which are not interesting to me, since things can be provable and yet never proved) and constructive proofs (a set whose membership is undecidable, yet which, by showing equivalence between the two systems or between the first and a subsystem of the second, we can accept). Again however, this is not a real limitation, because the constructive proof of a statement lets us infer that the statement is existentially provable. Thus we recover the “sensible” existentially provable statements, and discard the ones which are provable and yet unprovable.
I don’t think this is quite what you meant. This means PA + ¬Cons(PA) allows you to prove that you can prove ¬Cons(PA), where the second use of the word ‘prove’ does not make reference to any specific formal system. The turnstile refers to the specific theory that is being used, here PA + ¬Cons(PA), and the box symbol is used when the meaning is clear from context, but you’ve used both.
This seems like a description of some constructive logic version of PA. I’m not sure exactly how that would work or if there is a standard constructive axiomatization of the natural numbers, but the set of theorems of constructive logic is a subset of those of classical logic, so this will not be able to prove that some given axiom system is trustable if the original system cannot do so.
You could add constructive theorems if you also add axioms, which I’m not sure whether you’re proposing. For example, given a universal Turing machine U and a number x, there is a shortest program p such that U(p) = x and it could be the case, depending on U and x, that PA can prove that such a p exists, but not what it is. Then, by adding axioms that U never halts on certain inputs, it can be possible to deduce the value of p.
I don’t see how that could help with the reflection problem though. Do you propose adding soundness of some formal system as an axiom, but preventing Loeb’s theorem from applying by switching to constructive logic? I think Loeb’s theorem is still valid constructively; it doesn’t use negation, but I’m not sure if that is a sufficient criterion to be provable constructively.
It’s called Heyting arithmetic, but no, nowhere above do I mean intuitionistic logic when I say constructive. Or I don’t think I do. I recognize Löb’s theorem and am not trying restrict any system’s expressive power to avoid it (i.e. using Dan Willard’s self verifying theories).
Maybe I’ll try to prove some sufficient conditions for a Gödel machine to self modify in a toy universe given this perspective. That should make it more convincing / concrete.