Von Neumann’s critique of automata theory and logic in computer science
Quote from The General and Logical Theory of Automata. Corrected some typos using this version. H/T Hacker News.
There exists today a very elaborate system of formal logic, and, specifically, of logic as applied to mathematics. This is a discipline with many good sides, but also with certain serious weaknesses. This is not the occasion to enlarge upon the good sides, which I have certainly no intention to belittle. About the inadequacies, however, this may be said: Everybody who has worked in formal logic will confirm that it is one of the technically most refractory parts of mathematics. The reason for this is that it deals with rigid, all-or-none concepts, and has very little contact with the continuous concept of the real or of the complex number, that is, with mathematical analysis. Yet analysis is the technically most successful and best-elaborated part of mathematics. Thus formal logic is, by the nature of its approach, cut off from the best cultivated portions of mathematics, and forced onto the most difficult part of the mathematical terrain, into combinatorics.
The theory of automata, of the digital, all-or-none type, as discussed up to now, is certainly a chapter in formal logic. It would, therefore, seem that it will have to share this unattractive property of formal logic. It will have to be, from the mathematical point of view, combinatorial rather than analytical.
Probable Characteristics of Such a Theory. Now it seems to me that this will in fact not be the case. In studying the functioning of automata, it is clearly necessary to pay attention to a circumstance which has never before made its appearance in formal logic.
Throughout all modern logic, the only thing that is important is whether a result can be achieved in a finite number of elementary steps or not. The size of the number of steps which are required, on the other hand, is hardly ever a concern of formal logic. Any finite sequence of correct steps is, as a matter of principle, as good as any other. It is a matter of no consequence whether the number is small or large, or even so large that it couldn’t possibly be carried out in a lifetime, or in the presumptive lifetime of the stellar universe as we know it. In dealing with automata, this statement must he significantly modified. In the case of an automaton the thing which matters is not only whether it can reach a certain result in a finite number of steps at all but also how many such steps are needed. There are two reasons. First, automata are constructed in order to reach certain results in certain pre-assigned durations, or at least in pre-assigned orders of magnitude of duration. Second, the componentry employed has in every individual operation a small but nevertheless non-zero probability of failing. In a sufficiently long chain of operations the cumulative effect of these individual probabilities of failure may (if unchecked) reach the order of magnitude of unity-at which point it produces, in effect, complete unreliability. The probability levels which are involved here are very low, but still not too far removed from the domain of ordinary technological experience. It is not difficult to estimate that a high-speed computing machine, dealing with a typical problem, may have to perform as much as 10^12 individual operations. The probability of error on an individual operation which can be tolerated must, therefore, be small compared to 10^-12. I might mention that an electromechanical relay (a telephone relay) is at present considered acceptable if its probability of failure on an individual operation is of the order 10^-8. It is considered excellent if this order of probability is 10^-9 Thus the reliabilities required in a high-speed computing machine are higher, but not prohibitively higher, than those that constitute sound practice in certain existing industrial fields. The actually obtainable reliabilities are, however, not likely to leave a very wide margin against the minimum requirements just mentioned. An exhaustive study and a nontrivial theory will, therefore, certainly be called for.
Thus the logic of automata will differ from the present system of formal logic in two relevant respects.
1. The actual length of “chains of reasoning,” that is, of the chains of operations, will have to be considered.
2. The operations of logic (syllogisms, conjunctions, disjunctions, negations, etc., that is, in the terminology that is customary for automata, various forms of gating, coincidence, anti-coincidence, blocking, etc., actions) will all have to be treated by procedures which allow exceptions ( malfunctions ) with low but non-zero probabilities. All of this will lead to theories which are much less rigidly of an all-or-none nature than past and present formal logic. They will be of a much less combinatorial, and much more analytical, character.
I’m confused. It seems to me that if we already have a discrete/combinatorial mess on our hands, sprinkling some chance of failure on each little gear won’t summon the analysis fairy and make the mess easier to reason about. Or at least we need to be clever about the way randomness is added, to make the mess settle into something analytical. But von Neumann’s article sounds more optimistic about this approach. Does anyone understand why?
I think what he’s saying is that the existence of noise in computing hardware means that any computation done on this hardware must be (essentially) invariant to this noise, which leads the methods away from the precise, all-or-nothing logic of discrete math and into the fuzzier, smoother logic of probability distributions and the real line. This makes me think of analog computing, which is often done in environments with high noise and can indeed produce computations that are mostly invariant to it.
But, of course, analog computing is a niche field dwarfed by digital computing, making this prediction of von Neumann’s comically wrong: the solution people went with wasn’t to re-imagine all computations in a noise-invariant way, it was to improve the hardware to the point that the noise becomes negligible. But I don’t want to sound harsh here at all. The prediction was so wrong only because, at least as far as I know, there was no reasonable way to predict the effect that transistors would have on computing in the ’50s since they were not invented until around then. It seems reasonable from that vantage point to expect creative improvements in mathematical methods before a several-orders-of-magnitude improvement in hardware accuracy.
I don’t think he’s saying that error is good. He’s saying that ignoring error is bad. And there’s a valley of bad rationality in which the error level is low enough that people ignore it, but it still matters. Specifically, he says that black box models that don’t include error are bad. You should model error of components and error propagation and design circuits that are robust to error. I’m not sure what this would mean in practice, though. Today we design systems that recover from computers crashing, but recovering from wrong computation is harder.
But this is 1948. In 1956, he showed how to take digital gates with known error rates and build from them digital gates with lower error rates. This opens up the possibility of getting the error low enough that you can ignore it and embrace digital logic. I’m guessing that he changed his mind and rejected the earlier paper, but it would be nice to have an explicit statement.
#1 (at the end) sounds like complexity theory.
Some of what von Neumann says makes it sound like he’s interested in a mathematical foundation for analog computing, which I think has been done by now.