“But the good news is, there is no need! All we need to do to check which is faster, is throw some sample inputs at each and run tests.”
“no need”? Sadly, it’s hard to use such simple methods as anything like a complete replacement for proofs. As an example which is simultaneously extreme and simple to state, naive quicksort has good expected asymptotic performance, but its (very unlikely) worst-case performance falls back to bubble sort. Thus, if you use quicksort naively (without, e.g., randomizing the input in some way) somewhere where an adversary has strong influence over the input seen by quicksort, you can create a vulnerability to a denial-of-service attack. This is easy to understand with proofs, not so easy either to detect or to quantify with random sampling. Also, the pathological input has low Kolmogorov complexity, so the universe might well happen give it to your system accidentally even in situations where your aren’t faced by an actual malicious intelligent “adversary.”
Also sadly, we don’t seem to have very good standard technology for performance proofs. Some years ago I made a horrendous mistake in an algorithm preprint, and later came up with a revised algorithm. I also spent more than a full-time week studying and implementing a published class of algorithms and coming to the conclusion that I had wasted my time because the published claimed performance is provably incorrect. Off and on since then I’ve put some time into looking at automated proof systems and the practicalities of proving asymptotic performance bounds. The original poster mentioned ACL2; I’ve looked mostly at HOL Light (for ordinary math proofs) and to a lesser extent Coq (for program/algorithm proofs). The state of the art for program/algorithm proofs doesn’t seem terribly encouraging. Maybe someday it will be a routine master’s thesis to, e.g., gloss Okasaki’s Purely Functional Data Structures with corresponding performance proofs, but we don’t seem to be quite there yet.
Part of the problem with these is that there are limits to how much can be proven about correctness of programs. In particular, the general question of whether two programs will give the same output on all inputs is undecidable.
Proposition: There is no Turing machine which when given the description of two Turing machines accepts iff both the machines will agree on all inputs.
Proof sketch: Consider our hypothetical machine A that accepts descriptions iff they correspond to two Turing machines which agree on all inputs. We shall show that how we can construct a machine H from A which would solve the halting problem. Note that for any given machine D we can construct a machine [D, s] which mimics D when fed input string s (simply append states to D so that the machine first erases everything on the tape, writes out s on the tape and then executed the normal procedure for D). Then, to determine whether a given machine T accepts a given input s, ask machine A whether [T,s] agrees with the machine that always accepts. Since we’ve now constructed a Turing machine which solves the haling problem, our original assumption, the existence of A must be false.
There are other theorems of a similar nature that can be proven with more work. The upshot is that in general, there are very few things that a program can say about all programs.
True. Test inputs suffice for an optimizer that on average wins more than it loses, which is good enough to be useful, but if you want guaranteed efficiency, that comes back to proof, and the current state-of-the-art is a good way short of doing that in typical cases.
“But the good news is, there is no need! All we need to do to check which is faster, is throw some sample inputs at each and run tests.”
“no need”? Sadly, it’s hard to use such simple methods as anything like a complete replacement for proofs. As an example which is simultaneously extreme and simple to state, naive quicksort has good expected asymptotic performance, but its (very unlikely) worst-case performance falls back to bubble sort. Thus, if you use quicksort naively (without, e.g., randomizing the input in some way) somewhere where an adversary has strong influence over the input seen by quicksort, you can create a vulnerability to a denial-of-service attack. This is easy to understand with proofs, not so easy either to detect or to quantify with random sampling. Also, the pathological input has low Kolmogorov complexity, so the universe might well happen give it to your system accidentally even in situations where your aren’t faced by an actual malicious intelligent “adversary.”
Also sadly, we don’t seem to have very good standard technology for performance proofs. Some years ago I made a horrendous mistake in an algorithm preprint, and later came up with a revised algorithm. I also spent more than a full-time week studying and implementing a published class of algorithms and coming to the conclusion that I had wasted my time because the published claimed performance is provably incorrect. Off and on since then I’ve put some time into looking at automated proof systems and the practicalities of proving asymptotic performance bounds. The original poster mentioned ACL2; I’ve looked mostly at HOL Light (for ordinary math proofs) and to a lesser extent Coq (for program/algorithm proofs). The state of the art for program/algorithm proofs doesn’t seem terribly encouraging. Maybe someday it will be a routine master’s thesis to, e.g., gloss Okasaki’s Purely Functional Data Structures with corresponding performance proofs, but we don’t seem to be quite there yet.
Part of the problem with these is that there are limits to how much can be proven about correctness of programs. In particular, the general question of whether two programs will give the same output on all inputs is undecidable.
Proposition: There is no Turing machine which when given the description of two Turing machines accepts iff both the machines will agree on all inputs.
Proof sketch: Consider our hypothetical machine A that accepts descriptions iff they correspond to two Turing machines which agree on all inputs. We shall show that how we can construct a machine H from A which would solve the halting problem. Note that for any given machine D we can construct a machine [D, s] which mimics D when fed input string s (simply append states to D so that the machine first erases everything on the tape, writes out s on the tape and then executed the normal procedure for D). Then, to determine whether a given machine T accepts a given input s, ask machine A whether [T,s] agrees with the machine that always accepts. Since we’ve now constructed a Turing machine which solves the haling problem, our original assumption, the existence of A must be false.
There are other theorems of a similar nature that can be proven with more work. The upshot is that in general, there are very few things that a program can say about all programs.
Wouldn’t it have been easier to just link to Rice’s theorem?
I didn’t remember the name of the theorem and my Googlefu is weak,
True. Test inputs suffice for an optimizer that on average wins more than it loses, which is good enough to be useful, but if you want guaranteed efficiency, that comes back to proof, and the current state-of-the-art is a good way short of doing that in typical cases.