I think I made a mistake in mentioning partial evaluation. It distracts from my main point. The point I’m making a mess of is that Yoreth asks two questions:
If the coming singularity is such a monumental, civilization-filtering event, why is there virtually no mention of it in the mainstream? If it is so imminent, so important, and furthermore so sensitive to initial conditions that a small group of computer programmers can bring it about, why are there not massive governmental efforts to create seed AI?
I read (mis-read?) the rhetoric here as containing assumptions that I disagree with. When I read/mis-read it I feel that I’m being slipped the idea that governments have never been interested in AI. I also pick up a whiff of “the mainstream doesn’t know, we must alert them.” But mainstream figures such as John McCarthy and Peter Norvig know and are refraining from sounding the alarm.
So partial evaluation is a distraction and I only made the mistake of mentioning it because it obsesses me. But it does! So I’ll answer anyway ;-)
Why am I obsessed? My Is Lisp a Blub post suggests one direction for computer programming language research. Less speculatively, three important parts of computer science are compiling (ie hand compiling), writing compilers, and tools such as Yacc for compiling compilers. The three Futamura projections provide a way of looking at these three topics. I suspect it is the right way to look at them.
Lambda-the-ultimate had an interesting thread on the type-system feature-creep death-spiral. Look for the comment By Jacques Carette at Sun, 2005-10-30 14:10 linking to Futamura’s papers. So there is the link to having a theorem proving inside a partial evaluator.
Now partial evaluating looks like it might really help with self-improving AI. The AI might look at its source, realise that the compiler that it is using to compile itself is weak because it is a Futamura projection based compiler with an underpowered theorem prover, prove some of the theorems itself, re-compile, and start running faster.
Well, maybe, but the overviews I’ve read of the classic text by Jones, Gomard, and Sestoft, make me think that the start of the art only offers linear speed ups. If you write a bubble sort and use partial evaluation to compile it, it stays order n squared. The theorem prover will never transform to an n log n algorithm.
I’m trying to learn ACL2. It is a theorem prover and you can do things such as proving that quicksort and bubble sort agree. That is a nice result and you can imagine that fitting into a bigger picture. The partial evaluator wants to transform a bubble sort into something better, and the theorem prover can annoint the transformation as correct. I see two problems.
First, the state of the art is a long way from being automatic. You have to lead the theorem prover by the hand. It is really just a proof checker. Indeed the ACL2 book says
You are responsible for guiding it, usually by getting it to prove the necessary lemmas. Get used to thinking that it rarely proves anything substantial by itself.
it is a long way from proving (bubble sort = quick sort) on its own.
Second that doesn’t actually help. There is no sense of performance here. It only says that they agree, without saying which is faster. I can see a way to fix this. ACL2 can be used to prove that interpreters conform to their semantics. Perhaps it can be used to prove that an instrumented interpreter performs a calculation in fewer than n log n cycles. Thus lifting the proofs from proofs about programs to proofs about interpreters running programs would allow ACL2 to talk about performance.
This solution to problem two strikes me as infeasible. ACL2 cannot cope with the base level without hand holding, which I have not managed to learn to give. I see no prospect of lifting the proofs to include performance without adding unmanageable complications.
Could performance issues be built in to a theorem prover, so that it natively knows that quicksort is faster than bubble sort, without having to pass its proofs through a layer of interpretation? I’ve no idea. I think this is far ahead of the current state of computer science. I think it is preliminary to, and much simple than, any kind of self-improving artificial intelligence. But that is what I had in mind as the right kind of theorem prover.
There is a research area of static analysis and performance modelling. One of my Go playing buddies has just finished a PhD in it. I think that he hopes to use the techniques to tune up the performance of the TCP/IP stack. I think he is unaware of and uninterested in theorem provers. I see computer science breaking up into lots of little specialities, each of which takes half a life time to master. I cannot see the threads being pulled together until the human lifespan is 700 years instead of 70.
Ah, thanks, I see where you’re coming from now. So ACL2 is pretty much state-of-the-art from your point of view, but as you point out, it needs too much handholding to be widely useful. I agree, and I’m hoping to build something that can perform fully automatic verification of nontrivial code (though I’m not focusing on code optimization).
You are right of course that proving quicksort is faster than bubble sort, is even considerably more difficult than proving it is equivalent.
But the good news is, there is no need! All we need to do to check which is faster, is throw some sample inputs at each and run tests. To be sure, that approach is fallible, but what of it? The optimized version only needs to be probably faster than the original. A formal guarantee is only needed for equivalence.
“But the good news is, there is no need! All we need to do to check which is faster, is throw some sample inputs at each and run tests.”
“no need”? Sadly, it’s hard to use such simple methods as anything like a complete replacement for proofs. As an example which is simultaneously extreme and simple to state, naive quicksort has good expected asymptotic performance, but its (very unlikely) worst-case performance falls back to bubble sort. Thus, if you use quicksort naively (without, e.g., randomizing the input in some way) somewhere where an adversary has strong influence over the input seen by quicksort, you can create a vulnerability to a denial-of-service attack. This is easy to understand with proofs, not so easy either to detect or to quantify with random sampling. Also, the pathological input has low Kolmogorov complexity, so the universe might well happen give it to your system accidentally even in situations where your aren’t faced by an actual malicious intelligent “adversary.”
Also sadly, we don’t seem to have very good standard technology for performance proofs. Some years ago I made a horrendous mistake in an algorithm preprint, and later came up with a revised algorithm. I also spent more than a full-time week studying and implementing a published class of algorithms and coming to the conclusion that I had wasted my time because the published claimed performance is provably incorrect. Off and on since then I’ve put some time into looking at automated proof systems and the practicalities of proving asymptotic performance bounds. The original poster mentioned ACL2; I’ve looked mostly at HOL Light (for ordinary math proofs) and to a lesser extent Coq (for program/algorithm proofs). The state of the art for program/algorithm proofs doesn’t seem terribly encouraging. Maybe someday it will be a routine master’s thesis to, e.g., gloss Okasaki’s Purely Functional Data Structures with corresponding performance proofs, but we don’t seem to be quite there yet.
Part of the problem with these is that there are limits to how much can be proven about correctness of programs. In particular, the general question of whether two programs will give the same output on all inputs is undecidable.
Proposition: There is no Turing machine which when given the description of two Turing machines accepts iff both the machines will agree on all inputs.
Proof sketch: Consider our hypothetical machine A that accepts descriptions iff they correspond to two Turing machines which agree on all inputs. We shall show that how we can construct a machine H from A which would solve the halting problem. Note that for any given machine D we can construct a machine [D, s] which mimics D when fed input string s (simply append states to D so that the machine first erases everything on the tape, writes out s on the tape and then executed the normal procedure for D). Then, to determine whether a given machine T accepts a given input s, ask machine A whether [T,s] agrees with the machine that always accepts. Since we’ve now constructed a Turing machine which solves the haling problem, our original assumption, the existence of A must be false.
There are other theorems of a similar nature that can be proven with more work. The upshot is that in general, there are very few things that a program can say about all programs.
True. Test inputs suffice for an optimizer that on average wins more than it loses, which is good enough to be useful, but if you want guaranteed efficiency, that comes back to proof, and the current state-of-the-art is a good way short of doing that in typical cases.
I think I made a mistake in mentioning partial evaluation. It distracts from my main point. The point I’m making a mess of is that Yoreth asks two questions:
I read (mis-read?) the rhetoric here as containing assumptions that I disagree with. When I read/mis-read it I feel that I’m being slipped the idea that governments have never been interested in AI. I also pick up a whiff of “the mainstream doesn’t know, we must alert them.” But mainstream figures such as John McCarthy and Peter Norvig know and are refraining from sounding the alarm.
So partial evaluation is a distraction and I only made the mistake of mentioning it because it obsesses me. But it does! So I’ll answer anyway ;-)
Why am I obsessed? My Is Lisp a Blub post suggests one direction for computer programming language research. Less speculatively, three important parts of computer science are compiling (ie hand compiling), writing compilers, and tools such as Yacc for compiling compilers. The three Futamura projections provide a way of looking at these three topics. I suspect it is the right way to look at them.
Lambda-the-ultimate had an interesting thread on the type-system feature-creep death-spiral. Look for the comment By Jacques Carette at Sun, 2005-10-30 14:10 linking to Futamura’s papers. So there is the link to having a theorem proving inside a partial evaluator.
Now partial evaluating looks like it might really help with self-improving AI. The AI might look at its source, realise that the compiler that it is using to compile itself is weak because it is a Futamura projection based compiler with an underpowered theorem prover, prove some of the theorems itself, re-compile, and start running faster.
Well, maybe, but the overviews I’ve read of the classic text by Jones, Gomard, and Sestoft, make me think that the start of the art only offers linear speed ups. If you write a bubble sort and use partial evaluation to compile it, it stays order n squared. The theorem prover will never transform to an n log n algorithm.
I’m trying to learn ACL2. It is a theorem prover and you can do things such as proving that quicksort and bubble sort agree. That is a nice result and you can imagine that fitting into a bigger picture. The partial evaluator wants to transform a bubble sort into something better, and the theorem prover can annoint the transformation as correct. I see two problems.
First, the state of the art is a long way from being automatic. You have to lead the theorem prover by the hand. It is really just a proof checker. Indeed the ACL2 book says
it is a long way from proving (bubble sort = quick sort) on its own.
Second that doesn’t actually help. There is no sense of performance here. It only says that they agree, without saying which is faster. I can see a way to fix this. ACL2 can be used to prove that interpreters conform to their semantics. Perhaps it can be used to prove that an instrumented interpreter performs a calculation in fewer than n log n cycles. Thus lifting the proofs from proofs about programs to proofs about interpreters running programs would allow ACL2 to talk about performance.
This solution to problem two strikes me as infeasible. ACL2 cannot cope with the base level without hand holding, which I have not managed to learn to give. I see no prospect of lifting the proofs to include performance without adding unmanageable complications.
Could performance issues be built in to a theorem prover, so that it natively knows that quicksort is faster than bubble sort, without having to pass its proofs through a layer of interpretation? I’ve no idea. I think this is far ahead of the current state of computer science. I think it is preliminary to, and much simple than, any kind of self-improving artificial intelligence. But that is what I had in mind as the right kind of theorem prover.
There is a research area of static analysis and performance modelling. One of my Go playing buddies has just finished a PhD in it. I think that he hopes to use the techniques to tune up the performance of the TCP/IP stack. I think he is unaware of and uninterested in theorem provers. I see computer science breaking up into lots of little specialities, each of which takes half a life time to master. I cannot see the threads being pulled together until the human lifespan is 700 years instead of 70.
Ah, thanks, I see where you’re coming from now. So ACL2 is pretty much state-of-the-art from your point of view, but as you point out, it needs too much handholding to be widely useful. I agree, and I’m hoping to build something that can perform fully automatic verification of nontrivial code (though I’m not focusing on code optimization).
You are right of course that proving quicksort is faster than bubble sort, is even considerably more difficult than proving it is equivalent.
But the good news is, there is no need! All we need to do to check which is faster, is throw some sample inputs at each and run tests. To be sure, that approach is fallible, but what of it? The optimized version only needs to be probably faster than the original. A formal guarantee is only needed for equivalence.
“But the good news is, there is no need! All we need to do to check which is faster, is throw some sample inputs at each and run tests.”
“no need”? Sadly, it’s hard to use such simple methods as anything like a complete replacement for proofs. As an example which is simultaneously extreme and simple to state, naive quicksort has good expected asymptotic performance, but its (very unlikely) worst-case performance falls back to bubble sort. Thus, if you use quicksort naively (without, e.g., randomizing the input in some way) somewhere where an adversary has strong influence over the input seen by quicksort, you can create a vulnerability to a denial-of-service attack. This is easy to understand with proofs, not so easy either to detect or to quantify with random sampling. Also, the pathological input has low Kolmogorov complexity, so the universe might well happen give it to your system accidentally even in situations where your aren’t faced by an actual malicious intelligent “adversary.”
Also sadly, we don’t seem to have very good standard technology for performance proofs. Some years ago I made a horrendous mistake in an algorithm preprint, and later came up with a revised algorithm. I also spent more than a full-time week studying and implementing a published class of algorithms and coming to the conclusion that I had wasted my time because the published claimed performance is provably incorrect. Off and on since then I’ve put some time into looking at automated proof systems and the practicalities of proving asymptotic performance bounds. The original poster mentioned ACL2; I’ve looked mostly at HOL Light (for ordinary math proofs) and to a lesser extent Coq (for program/algorithm proofs). The state of the art for program/algorithm proofs doesn’t seem terribly encouraging. Maybe someday it will be a routine master’s thesis to, e.g., gloss Okasaki’s Purely Functional Data Structures with corresponding performance proofs, but we don’t seem to be quite there yet.
Part of the problem with these is that there are limits to how much can be proven about correctness of programs. In particular, the general question of whether two programs will give the same output on all inputs is undecidable.
Proposition: There is no Turing machine which when given the description of two Turing machines accepts iff both the machines will agree on all inputs.
Proof sketch: Consider our hypothetical machine A that accepts descriptions iff they correspond to two Turing machines which agree on all inputs. We shall show that how we can construct a machine H from A which would solve the halting problem. Note that for any given machine D we can construct a machine [D, s] which mimics D when fed input string s (simply append states to D so that the machine first erases everything on the tape, writes out s on the tape and then executed the normal procedure for D). Then, to determine whether a given machine T accepts a given input s, ask machine A whether [T,s] agrees with the machine that always accepts. Since we’ve now constructed a Turing machine which solves the haling problem, our original assumption, the existence of A must be false.
There are other theorems of a similar nature that can be proven with more work. The upshot is that in general, there are very few things that a program can say about all programs.
Wouldn’t it have been easier to just link to Rice’s theorem?
I didn’t remember the name of the theorem and my Googlefu is weak,
True. Test inputs suffice for an optimizer that on average wins more than it loses, which is good enough to be useful, but if you want guaranteed efficiency, that comes back to proof, and the current state-of-the-art is a good way short of doing that in typical cases.