This is a good approach for dealing with the halting problem, but I think that Lob’s theorem is not so closely related that getting around the halting problem means you get around Lob’s theorem.
The theoretical AI that would run into Lob’s theorem would need more general proof producing capability than these relatively simple program analyzers.
It seems like these program analyzers are built around the specification S they check for, with the human programmer doing the work of constructing a structure of a proof which can be filled in to a complete proof by supplying basic facts that the program can check. For example, I have a library that produces .Net assemblies, with byte code that targets a stack based virtual machine, and I want to verify that instructions to read elements off the stack get elements of the type is expecting. So I wrote my library to keep track of types that could be written to the stack at any execution point (represented by a stack of types). It is straightforward to compute the possible stack states after each instruction, given that instruction and the previous possible stack trace, and determine if the instruction is legal given the previous state (well, branching makes it more complicated, but not too much). So, in this case, I came up with the structure of tracking possible states after each instruction, and then it was straightforward to write my program to fill in that structure, but I did not, and don’t know how to, write my program to come up with the proof structure. It is therefor easy to be confident that proof will have nice properties, because the space of possible proofs with this structure is much smaller than the space of all possible proofs.
The theoretical AI that would run into Lob’s theorem would be able to come up with proof structures, as an AI that could only use proof structures prepackaged by human programmers would have huge gaps in its capabilities. This may introduce difficulties not seen in simple program analyzers.
In the example I asked about earlier, the optimizer that needs to prove things about its own future decisions, it runs into an issue that Lob’s theorem applies to. In order prove that its own future decisions are good, it would rely on the fact that it’s future decision are based on its own sound proof system, so it needs to assert that its own proof system is sound, that if its proof system says “X”, then X. Lob’s theorem says that, for all statements X, if a system P (at least as powerful as Peano arithmetic) says “P says ‘X’ implies X” then P says X. So, if the system asserts its own soundness, it asserts anything.
So, in summary:
Lob’s theorem is a problem for generally powerful formal logic based proof systems that assert their own soundness.
Program analyzers are formal logic based proof systems that do not run into Lob’s theorem, because they are not generally powerful, and do not assert their own soundness.
Humans are generally powerful proof generators than can have confidence in their own soundness, but are not purely based on formal logic, (we might intuitively dismiss a formal logical proof as “spurious”) and so can get around Lob’s theorem, but we don’t understand how humans do this well enough to write a program to do it.
Paul’s idea of a probabilistic proof system could lead to a generally powerful proof generator with confidence in its own soundness that is not based on formal logic, so that it gets around Lob’s theorem, that we can understand well enough to write a program for.
It seems like these program analyzers are built around the specification S they check for, with the human programmer doing the work of constructing a structure of a proof which can be filled in to a complete proof by supplying basic facts that the program can check. For example, I have a library that produces .Net assemblies, with byte code that targets a stack based virtual machine, and I want to verify that instructions to read elements off the stack get elements of the type is expecting. So I wrote my library to keep track of types that could be written to the stack at any execution point (represented by a stack of types). It is straightforward to compute the possible stack states after each instruction, given that instruction and the previous possible stack trace, and determine if the instruction is legal given the previous state (well, branching makes it more complicated, but not too much). So, in this case, I came up with the structure of tracking possible states after each instruction, and then it was straightforward to write my program to fill in that structure, but I did not, and don’t know how to, write my program to come up with the proof structure. It is therefor easy to be confident that proof will have nice properties, because the space of possible proofs with this structure is much smaller than the space of all possible proofs.
I’m not sure I’ve understood what you have in mind here, but in the general case complete type checking in .NET (that is, proving that an assembly not only is syntactically well-formed but also never throws type-related exceptions at runtime) is undecidable because of Rice’s theorem.
In the general case, complete type checking is as difficult as proving arbitrary claims in first-order logic.
I am not trying to write a classifier that tells you whether or not an arbitrary program throws a type exception. I wrote a verifier that tells you whether or not an arbitrary program can be proven not to throw type exceptions (except possibly at an explicit cast statement, or a throw exception statement) with a particular proof strategy that covers a huge space of useful, nicely structured programs.
See also jsteinhardt’s comment I was responding to, which discussed getting around the halting problem by allowing the checker to say “I don’t know”.
am not trying to write a classifier that tells you whether or not an arbitrary program throws a type exception. I wrote a verifier that tells you whether or not an arbitrary program can be proven not to throw type exceptions (except possibly at an explicit cast statement, or a throw exception statement)
I’m not an expert on .NET, but is there anything that can throw a type exception other than an explicit cast or an explicit throw (or the standard library, I suppose)?
There are sequences of .Net instructions that result in the runtime throwing type exceptions, because it tries to read a value of a certain type of the stack, and it gets an incompatible value. This is the situation that my verifier guards against.
The standard .Net runtime also includes a verifier that checks the same thing, and it will not run code that fails this validation unless it is explicitly trusted. So a verifiable .Net assembly will not throw type exceptions without an explicit cast or throw, but an arbitrary assembly may do so. The compilers for the standard languages such as C# will generally only produce verifiable assemblies unless you explicitly mark parts of the source code as “unsafe”, and unsafe, or unverifiable assemblies need special permissions to run at all.
(There are other properties that both my verifier and the standard verifier check for. The reason I wrote my own is that it produces much more informative descriptions of problems it finds, and it is integrated into my assembly emitting libraries, so it detects problems as the assembly to be emitted is defined, and when run in the debugger, will easily show the compiler code and execution state that caused the problem.)
So a verifiable .Net assembly will not throw type exceptions without an explicit cast or throw, but an arbitrary assembly may do so.
IIUC, unverifiable code does not, or at least is not guaranteed to, politely throw an exception should a type error occur. It may crash the runtime or fail silently leaving the application in an incorrect state.
(There are other properties that both my verifier and the standard verifier check for. The reason I wrote my own is that it produces much more informative descriptions of problems it finds, and it is integrated into my assembly emitting libraries, so it detects problems as the assembly to be emitted is defined, and when run in the debugger, will easily show the compiler code and execution state that caused the problem.)
Ok. I thought that you were considering assemblies that passed the standard .NET verification and you were trying to check for some stronger property (such as absence of runtime exceptions caused by downcasts). That would have been equivalent to arbitrary first-order logic inference. Since you are instead checking for decidable properties, your system is indeed not equivalent to arbitrary first-order logic inference.
But as jsteinhardt says, it is actually possible to write verifiers that attempt to check for undecidable properties, provided that they have the option to give up.
My mathematical logic is a bit rusty, but my impression is that the following are true:
Godel’s theorem presents a strictly stronger obstacle than Lob’s theorem. A reflectively consistent theory may still be incomplete, but any complete theory is necessarily reflectively consistent.
The undecidability of the halting problem is basically Godel’s theorem stated in computational terms. If we could identify a subset L of Turing machines for whom the halting problem can be decided, as long as it was closed under operations such as inserting a (non-self-referential) sub-routine, then we would be able to verify any (non-self-referential) property of the program that was also expressible in L. This is a sketch of a claim rather than an actual claim that I’ve proved, though.
Finally, I think it’s worth pointing out an actual example of a program analysis tool since I think they are more powerful than you have in mind. The following slides are a good example of such a tool. At a high level, it gets around the problems you are worried about by constructing an over-approximation of the halting problem that is expressible in propositional logic (and thus decidable, in fact it is even in NP). More generally we can construct a sequence of approximations, each expressible in propositional logic, whose conjunction is no longer an approximation but in fact exactly the original statement.
Godel’s theorem presents a strictly stronger obstacle than Lob’s theorem.
Why do you say that? My understanding is that Godel’s theorem says that a (sufficiently powerful) logical system has true statements it can’t prove, but these statements are excessively complicated and probably not important. Is there some way you envision an AGI being limited in its capacity to achieve its goals by Godel’s theorem, as we envision Lob’s theorem blocking an AGI from trusting its future self to make effective decisions? (Besides where the goals are tailored to be blocked by the theorem: “Prove all true statements in my formal system”)
Finally, I think it’s worth pointing out an actual example of a program analysis tool since I think they are more powerful than you have in mind. The following slides are a good example of such a tool.
As near as I can tell, this is more powerful than other static analysis tools that I have seen (though maybe not, the Google Web Toolkit optimizer is pretty impressive), but it is well within what I expect to be possible, and doesn’t get around Lob’s theorem. (I can’t be too confident in this assessment of its power because I don’t see a clear claim of what sort of statements it can prove or how it works except that it seems to detect when inputs to a statement may have invalid values and that it uses brute force techniques to analyze functions and then associates a summary of the analysis with that function (constraints on valid inputs and guarantees on outputs?) so that its analysis of call sites can use the summary.) (This sort of program analyzer is interesting in its own right, and I would like to see a more complete explanation of it, but I don’t think it’s existence says anything about the problems posed by Lob’s theorem.)
Do you agree or disagree that complete implies reflectively consistent? If you agree, then do you agree or disagree that this means avoidance of Godelian obstacles implies avoidance of Lobian obstacles? If you agree with both of those statements, I’m confused as to why:
Godel’s theorem presents a strictly stronger obstacle than Lob’s theorem.
Ah, so by “Godel’s theorem presents a strictly stronger obstacle than Lob’s theorem” you mean if you overcome Godelian obstacles you also overcome Lobian obstacles? I think I agree, but I am not sure that it is relevant, because the program analyzer examples don’t overcome Godelian obstacles, they just cope with the Godelian obstacles, which does not similarly imply coping with or overcoming Lobian obstacles.
This is a good approach for dealing with the halting problem, but I think that Lob’s theorem is not so closely related that getting around the halting problem means you get around Lob’s theorem.
The theoretical AI that would run into Lob’s theorem would need more general proof producing capability than these relatively simple program analyzers.
It seems like these program analyzers are built around the specification S they check for, with the human programmer doing the work of constructing a structure of a proof which can be filled in to a complete proof by supplying basic facts that the program can check. For example, I have a library that produces .Net assemblies, with byte code that targets a stack based virtual machine, and I want to verify that instructions to read elements off the stack get elements of the type is expecting. So I wrote my library to keep track of types that could be written to the stack at any execution point (represented by a stack of types). It is straightforward to compute the possible stack states after each instruction, given that instruction and the previous possible stack trace, and determine if the instruction is legal given the previous state (well, branching makes it more complicated, but not too much). So, in this case, I came up with the structure of tracking possible states after each instruction, and then it was straightforward to write my program to fill in that structure, but I did not, and don’t know how to, write my program to come up with the proof structure. It is therefor easy to be confident that proof will have nice properties, because the space of possible proofs with this structure is much smaller than the space of all possible proofs.
The theoretical AI that would run into Lob’s theorem would be able to come up with proof structures, as an AI that could only use proof structures prepackaged by human programmers would have huge gaps in its capabilities. This may introduce difficulties not seen in simple program analyzers.
In the example I asked about earlier, the optimizer that needs to prove things about its own future decisions, it runs into an issue that Lob’s theorem applies to. In order prove that its own future decisions are good, it would rely on the fact that it’s future decision are based on its own sound proof system, so it needs to assert that its own proof system is sound, that if its proof system says “X”, then X. Lob’s theorem says that, for all statements X, if a system P (at least as powerful as Peano arithmetic) says “P says ‘X’ implies X” then P says X. So, if the system asserts its own soundness, it asserts anything.
So, in summary:
Lob’s theorem is a problem for generally powerful formal logic based proof systems that assert their own soundness.
Program analyzers are formal logic based proof systems that do not run into Lob’s theorem, because they are not generally powerful, and do not assert their own soundness.
Humans are generally powerful proof generators than can have confidence in their own soundness, but are not purely based on formal logic, (we might intuitively dismiss a formal logical proof as “spurious”) and so can get around Lob’s theorem, but we don’t understand how humans do this well enough to write a program to do it.
Paul’s idea of a probabilistic proof system could lead to a generally powerful proof generator with confidence in its own soundness that is not based on formal logic, so that it gets around Lob’s theorem, that we can understand well enough to write a program for.
I’m not sure I’ve understood what you have in mind here, but in the general case complete type checking in .NET (that is, proving that an assembly not only is syntactically well-formed but also never throws type-related exceptions at runtime) is undecidable because of Rice’s theorem.
In the general case, complete type checking is as difficult as proving arbitrary claims in first-order logic.
I am not trying to write a classifier that tells you whether or not an arbitrary program throws a type exception. I wrote a verifier that tells you whether or not an arbitrary program can be proven not to throw type exceptions (except possibly at an explicit cast statement, or a throw exception statement) with a particular proof strategy that covers a huge space of useful, nicely structured programs.
See also jsteinhardt’s comment I was responding to, which discussed getting around the halting problem by allowing the checker to say “I don’t know”.
I’m not an expert on .NET, but is there anything that can throw a type exception other than an explicit cast or an explicit throw (or the standard library, I suppose)?
There are sequences of .Net instructions that result in the runtime throwing type exceptions, because it tries to read a value of a certain type of the stack, and it gets an incompatible value. This is the situation that my verifier guards against.
The standard .Net runtime also includes a verifier that checks the same thing, and it will not run code that fails this validation unless it is explicitly trusted. So a verifiable .Net assembly will not throw type exceptions without an explicit cast or throw, but an arbitrary assembly may do so. The compilers for the standard languages such as C# will generally only produce verifiable assemblies unless you explicitly mark parts of the source code as “unsafe”, and unsafe, or unverifiable assemblies need special permissions to run at all.
(There are other properties that both my verifier and the standard verifier check for. The reason I wrote my own is that it produces much more informative descriptions of problems it finds, and it is integrated into my assembly emitting libraries, so it detects problems as the assembly to be emitted is defined, and when run in the debugger, will easily show the compiler code and execution state that caused the problem.)
Thanks for the clarification.
IIUC, unverifiable code does not, or at least is not guaranteed to, politely throw an exception should a type error occur. It may crash the runtime or fail silently leaving the application in an incorrect state.
Ok. I thought that you were considering assemblies that passed the standard .NET verification and you were trying to check for some stronger property (such as absence of runtime exceptions caused by downcasts). That would have been equivalent to arbitrary first-order logic inference.
Since you are instead checking for decidable properties, your system is indeed not equivalent to arbitrary first-order logic inference.
But as jsteinhardt says, it is actually possible to write verifiers that attempt to check for undecidable properties, provided that they have the option to give up.
My mathematical logic is a bit rusty, but my impression is that the following are true:
Godel’s theorem presents a strictly stronger obstacle than Lob’s theorem. A reflectively consistent theory may still be incomplete, but any complete theory is necessarily reflectively consistent.
The undecidability of the halting problem is basically Godel’s theorem stated in computational terms. If we could identify a subset L of Turing machines for whom the halting problem can be decided, as long as it was closed under operations such as inserting a (non-self-referential) sub-routine, then we would be able to verify any (non-self-referential) property of the program that was also expressible in L. This is a sketch of a claim rather than an actual claim that I’ve proved, though.
Finally, I think it’s worth pointing out an actual example of a program analysis tool since I think they are more powerful than you have in mind. The following slides are a good example of such a tool. At a high level, it gets around the problems you are worried about by constructing an over-approximation of the halting problem that is expressible in propositional logic (and thus decidable, in fact it is even in NP). More generally we can construct a sequence of approximations, each expressible in propositional logic, whose conjunction is no longer an approximation but in fact exactly the original statement.
Why do you say that? My understanding is that Godel’s theorem says that a (sufficiently powerful) logical system has true statements it can’t prove, but these statements are excessively complicated and probably not important. Is there some way you envision an AGI being limited in its capacity to achieve its goals by Godel’s theorem, as we envision Lob’s theorem blocking an AGI from trusting its future self to make effective decisions? (Besides where the goals are tailored to be blocked by the theorem: “Prove all true statements in my formal system”)
As near as I can tell, this is more powerful than other static analysis tools that I have seen (though maybe not, the Google Web Toolkit optimizer is pretty impressive), but it is well within what I expect to be possible, and doesn’t get around Lob’s theorem. (I can’t be too confident in this assessment of its power because I don’t see a clear claim of what sort of statements it can prove or how it works except that it seems to detect when inputs to a statement may have invalid values and that it uses brute force techniques to analyze functions and then associates a summary of the analysis with that function (constraints on valid inputs and guarantees on outputs?) so that its analysis of call sites can use the summary.) (This sort of program analyzer is interesting in its own right, and I would like to see a more complete explanation of it, but I don’t think it’s existence says anything about the problems posed by Lob’s theorem.)
Do you agree or disagree that complete implies reflectively consistent? If you agree, then do you agree or disagree that this means avoidance of Godelian obstacles implies avoidance of Lobian obstacles? If you agree with both of those statements, I’m confused as to why:
is a controversial statement.
Ah, so by “Godel’s theorem presents a strictly stronger obstacle than Lob’s theorem” you mean if you overcome Godelian obstacles you also overcome Lobian obstacles? I think I agree, but I am not sure that it is relevant, because the program analyzer examples don’t overcome Godelian obstacles, they just cope with the Godelian obstacles, which does not similarly imply coping with or overcoming Lobian obstacles.