Link: Strong Inference
The paper “Strong Inference” by John R. Platt is a meta-analysis of scientific methodology published in Science in 1964. It starts off with a wonderfully aggressive claim:
Scientists these days tend to keep up a polite fiction that all science is equal.
The paper starts out by observing that some scientific fields progress much more rapidly than others. Why should this be?
I think the usual explanations we tend to think of—such as the tractability of the subject, or the quality or education of the men drawn into it, or the size of the research contracts—are important but inadequate… Rapidly moving fields are fields where a particular method of doing scientific research is systematically used and taught, an accumulative method of inductive inference that is so effective that I think it should be given the name “Strong Inference”.
The definition of Strong Inference, according to Platt, is the formal, explicit, and regular adherence to the following procedure:
Devise alternative hypotheses;
Devise a crucial experiment (or several of them), with alternative possible outcomes, each of which will, as nearly as possible, exclude one or more of the hypotheses;
Carry out the experiment so as to get a clean result;
(Goto 1) - Recycle the procedure, making subhypotheses or sequential hypotheses to refine the problems that remain; and so on.
This seems like a simple restatement of the scientific method. Why does Platt bother to tell us something we already know?
The reason is that many of us have forgotten it. Science is now an everyday business. Equipment, calculations, lectures become ends in themselves. How many of us write down our alternatives and crucial experiments every day, focusing on the exclusion of a hypothesis?
Platt gives us some nice historical anecdotes of strong inference at work. One is from high-energy physics:
[One of the crucial experiments] was thought of one evening at suppertime: by midnight they had arranged the apparatus for it, and by 4am they had picked up the predicted pulses showing the non-conservation of parity.
The paper emphasizes the importance of systematicity and rigor over raw intellectual firepower. Roentgen, proceeding systematically, shows us the meaning of haste:
Within 8 weeks after the discovery of X-rays, Roentgen had identified 17 of their major properties.
Later, Platt argues against the overuse of mathematics:
I think that anyone who asks the question about scientific effectiveness will also conclude that much of the mathematicizing in physics and chemistry today is irrelevant if not misleading.
(Fast forward to the present, where we have people proving the existence of Nash equilibria in robotics and using Riemannian manifolds in computer vision, when robots can barely walk up stairs and the problem of face detection still has no convincing solution.)
One of the obstacles to hard science is that hypotheses must come into conflict, and one or the other must eventually win. This creates sociological trouble, but there’s a solution:
The conflict and exclusion of alternatives that is necessary to sharp inductive inference has been all too often a conflict between men, each with his single Ruling Theory. But whenever each man begins to have multiple working hypotheses, it becomes purely a conflict between ideas.
Finally, Platt suggests that all scientists continually bear in mind The Question:
But, sir, what experiment could disprove your hypothesis?
----
Now, LWers, I am not being rhetorical, I put these questions to you sincerely: Is artificial intelligence, rightly considered, an empirical science? If not, what is it? Why doesn’t AI make progress like the fields mentioned in Platt’s paper? Why can’t AI researchers formulate and test theories the way high-energy physicists do? Can a field which is not an empirical science ever make claims about the real world?
If you have time and inclination, try rereading my earlier post on the Compression Rate Method, especially the first part, in the light of Platt’s paper.
Edited thanks to feedback from Cupholder.
Most of the empirical sciences you mentioned involve understanding things that already exist, rather than bringing new things into existence.
This sounds like a disguised query. What are you really asking about AI? Separate the definition from the implications; what’s an “empirical science”, what further properties does the definition imply about entities it describes, and does AI need those properties to accomplish its goal?
(I’d describe it more as a research program that draws from several empirical sciences (cognitive science, etc.) and sometimes motivates advances in those fields, in the same way that nuclear physics is an empirical science, while the projects to create nuclear weaponry were not empirical sciences in themselves but were research goals drawing heavily from scientific knowledge.)
Within limited domains, they do. Face recognition, for instance. You are correct that it’s not a solved problem, but any new theory of face recognition, or in general any proposed computational model of the visual cortex, can be trivially tested: have it look through pictures and recognize faces, and compare it to how well untrained humans can recognize faces.
Also, reiterating a comment I made a couple days ago on a similar statement of yours:
You seem to be jumping from “AI isn’t a hard science like physics because we don’t know exactly what we’re trying to mathematically model” (since it studies minds-in-general rather than just the minds we can directly do experiments on) to “math isn’t useful in AI or computer vision”. Maybe computer vision can benefit from Riemannian manifolds, maybe it can’t, maybe it’s a privileged hypothesis that we shouldn’t even be bothering to ask about, but do you really expect that any technical solution will be non-mathematical?
Isn’t this rather like building a few cars, comparing how well they run, and calling it physics?
Okay, but my suggestion is that this mode of empirical verification just isn’t good enough. People have been using this obvious method for decades, and we still can’t solve face detection (face recognition is presumably much harder than mere detection). This implies we need a non-obvious method of empirical evaluation.
No, but I do believe that over-mathematization is a real problem.
Okay, but non-”whatever scientists are currently doing” is not an epistemology! You need to say what specific alternate you believe would be better, and you haven’t done so. Instead, you’ve just made broad, sweeping generalizations about how foolish most scientists are, while constantly delaying the revelation of the superior method you think they should be using.
Please, please, just get to the point.
As for facial recognition, the error was in believing that it should be simple to explain what we’re doing to a blank slate. Our evolutionary history includes billions of years of friend-or-foe, kin-or-nonkin identification, which means the algorithm that analyzes images for faces will be labyrinthine and involve a bunch of hammered-together pieces. But I don’t see anything you’ve proposed that would lead to a faster solution of this problem; just casual dismissals that everyone’s doing it wrong because they use fancy math that just can’t be right.
FWIW, the best way to reverse-engineer a kind of cognition is to see what it gets wrong so you know what heuristics it’s using. For facial recognition, that means optical illusions regarding faces. For example, look at this.
The bottom two images are the same, but flipped. Yet one face looks thinner than the other. There’s a clue worth looking at.
The top two images are upside-down pictures of Margaret Thatcher that don’t seem all that different, but when you flip it over to see them right-side up on your photo-viewing tool (and consider this your WARNING), one looks hideously deformed. There’s another clue to look at.
Now, how would I have known to do that from the new, great epistemology you’re proposing?
How right you are about the special nature of face recognition. Another clue is the difficulty of describing a face in words so that another person can picture it. We use aspects of the face that do not register in consciousness to make the identification. Still another clue is that false faces ‘pop out’ in our vision. Cracks in pavement, clouds in the sky, anything vaguely like a face can pop out as a face. And again, faces are the first things that babies take an interest in. Their eyes will fasten on a plate with a simple happy face (two dots and a line) drawn on it. It is certainly not ordinary every day vision.
Can you easily read upside down?
I’m asking because neither of the upside-down face illusions seem to “work” for me. (I was immediately startled by the deformed version of Margaret Thatcher.)
Yes, pretty easily (just checked).
So the deformed Thatcher picture wasn’t any more startling upon turning the image over? Well, you may have some unusual aspects to your visual cognition. Try it with some more faces and see if the same thing comes up.
Another possibility would be to look at visual recognition systems in other species.
If we’re talking about “empirical verification”/”empirical evaluation”, then we’re talking about how to check if a possible solution is correct, not about how to come up with possible solutions. This method of verification keeps coming up negative because we haven’t solved face detection, not the other way around.
So if we’re doing something wrong, it’s in the way we’re trying to formulate answers, not in the way we’re trying to check them. And even there, saying we need more “non-obvious” thinking has the same problem as saying that something needs complexity: it’s not an inherently desirable quality on its own, and it doesn’t tell us anything new about how to do it (presumably people already tried the obvious answers, so all the current attempts will be non-obvious anyway… hence Riemannian manifolds). If you have some specific better method that happens to be non-obvious, then just say what it is.
My gut feeling is that the top-level post doesn’t give a nice summary of what ‘strong inference’ actually is, so here’s a snippet from Platt’s paper that does:
Thanks, I edited the post to reflect this suggestion/comment.
I would like to observe that you can divide sciences as follows:
1) physical sciences (Physics and Chemistry are models): investigating the fundamentals using the old idea of the scientific method, ie hypothesis to experiment to accepting or rejecting hypothesis. It is a predict and test method.
2) historic sciences (Biology, Geology, Cosmology are models but not Biochemistry and Biophysics which go it the first group); uses the theories of physical science to create historic theories (like evolution or plate tectonics) about how things became what they are. It is only rarely that a hypothesis can be directly tested by experiment and the method sort of proceeds as observation/taxonomy to hypothesis explaining bodies of observations to simple experiments to prove that proposed processes are possible. It is a collect data and try to organize into a integrated story sort of method and very inductive.
3) inventive Science (Engineering, Medicine are examples): uses the theories of physical and historic sciences to create useful and/or profitable things. Here the method is to identify a problem then look for solutions and test proposed solutions in tests/trials.
Associated with these sciences are theoretical areas (Mathematics, Information Theory are examples): These do not test theories and are not in the business of predicting and testing against reality. Their deductive rather then inductive. They create theoretic structures that are logically robust and can be used by the other sciences as powerful tools.
I have left out the social sciences, history proper, linguistics, anthropology and economics because it is not clear to me that they are even sciences and they do not fit the mold of mathematics either. But they are (like the others) large communal scholarships and I am not putting them down when I say they may not be sciences.
My definition of a science is a communal scholarship that
a) is not secretive but public using peer reviewed publication (or its equivalent) in enough detail that the work could be repeated,
b) is concerned with understanding material physical reality and doing so by testing theories in experiments or their equivalents ie no magical or untestable explanations,
c) accepts the consensus of convinced scientists in the appropriate field rather than an authority as a measure of truth. The method by which the scientists are convinced may be anything in principle, but scientists are not likely to be convinced if the math has mistakes, logic is faulty, experiments are without controls, supernatural reasons are used etc. etc.
AI would definitely fall into the inventive sciences. An appropriate method would be to identify a problem, invent solutions using knowledge of physical and historic science and the tools of math etc., see if the solutions work in systematic tests. None of these is simple. To identify a problem, you need to have a vision of what is the end point success and a proposed path to get there. Vision does not come cheap. Invention of solutions is a creative process. Testing takes as much skill and systematic, clear thinking as any other experimental-ish procedure.
It seems that well-established physical or historical sciences invariably serve as the theoretical underpinning for each of the inventive sciences (electrical/mechanical/chemical engineering has the physical sciences, medicine has biology, etc.). What is the theoretical underpinning of AI? Traditionally it has been computer science, but on the face of it CS says little or nothing about the mechanisms of intelligence. Neuroscience isn’t quite it either, since neuroscience is focused on describing the human brain, and any principles that might apply to intelligence in general need to be abstracted away.
It would seem, then, that AI has no theoretical underpinning, and one can make a good argument that the lack of advanced AI is due to this very fact. Certainly the goal of AI is to (eventually) engineer machine intelligence, but it would seem that a major focus of present-day AI is to acquire theoretical insight that would serve as the foundation of an engineering effort. I think this shows that either AI is not just an inventive science, or that we need to talk separately about intelligence science and intelligence engineering.
“AI has no theoretical underpinning” Very good—that is the hole in what I said about AI. I have always thought of AI as a part of computer science but of course it is possible to think of it as separate. In that case it is underpinned by computer science and (?). Neuroscience if you are trying to duplicate human intelligence (or even ant intelligence) in a non-biological system. But neuroscience is not sufficiently developed to underpin anything. I don’t know how far Information Theory has progressed but I suspect it is not up to the job at present.
Excellent analysis. This is the kind of discussion I was looking for.
Note that it is necessary to do empirical science before inventive science becomes possible. Engineering depends almost completely on knowledge of physical laws. So a plausible diagnosis of the cause of the limited progress in AI, is that it’s an attempt to do invention before the relevant empirical science has become available.
Historically, invention often, maybe usually, precedes the science that describes it. Thermodynamics grew out of steam engines, not the other way around, and the same for transistors in the fifties, for two examples off the top of my head. I suspect it is because the technology provides a simpler and clearer example of the relevant science than natural examples. And the development of empirical sciences are useful to the further development of the technologies.
I’m reminded of Gawande on the importance of using checklists in medicine.
To me, the major questions are a) is Platt’s thesis correct b) how much further has this process proceeded since 1964. c) do any or most applied sciences currently do the sort of science Platt advocates d) if not does this imply the possibility for a soft-takeoff via singleton through the reinvention of science?
Re: Is artificial intelligence, rightly considered, an empirical science? If not, what is it?
Not exactly science: more technology and engineering.
The current state of AI seems to fit comfortably within Eric Drexler’s “Theoretical Applied Science”, see Appendix A of “Nanosystems” for a description and a “contrast and comparison” with engineering and theoretical physics.
Are there any particular examples of fields where “overmathematizing” demonstrably slowed down research?
How about String Theory in physics and the Gaussian Copula in finance?
Not demonstrable yet. If a correct theory of quantum gravity is found that turns out not to involve such sophisticated mathematics, then you’ll have a case for this. Right now all you have is the opinion of a few contrarians.
It should be string theorists’ job to defend string theory by actually producing something. A proposed theory is not innocent until proven guilty.
The point is, that’s not a case where we’ve actually found the way ahead, and shown that the mathematical speculation was a dead end. It’s therefore relatively weak evidence.
And given the number of times that further formalizing a field has paid off in the physical sciences, this doesn’t at all convince me that “overmathematizing” is a general problem.
In the case of quantum mechanics, the additional formalizing did certainly pay off. But the formalizing was done after the messy first version of the theory made several amazingly accurate predictions.
I thought that the simplifying assumptions, like the independence of mortgage defaults, were the root of the hidden black swan risk in the Gaussian copula. Anyway, the non-theoretical instinct trading of the early 80s led to the Savings and Loan crisis (see e.g. Liar’s Poker for an illustration of the typical attitudes), and I don’t think we have a “safer” candidate theory of financial derivatives to point to.
Sure, it’s not that the idea of the Gaussian Copula itself is wrong. It’s a mathematical theory, the only way it can be wrong is if there’s a flaw in the proof. The problem is that people were overeager to apply the GC as a description of reality. Why? Well, my belief is that it has to do with overmathematization.
Really, it’s odd that there is so much pushback against this idea. To me, it seems like a natural consequence of the Hansonian maxim that “research isn’t about progress”. People want to signal high status and affiliate with other high status folks. One way to do this is the use of gratuitous mathematics. It’s just like those annoying people who use big words to show off.
I still don’t see that you’ve demonstrated overmathematization as a hindering factor.
-Finance gurus used advanced math.
-Finance gurus made bad assumptions about the mortgage market.
How have you shown that one caused the other? What method (that you should have presented in your first post instead of dragging this out to at least four) would have led finance gurus to not make bad assumptions, and would have directed them toward less math?
I agree that it’s gotten to the point where academia adheres to standards that don’t actually maximize research progress, and too often try to look impressive at the expense of doing something truly worthwhile. But what alternate epistemology do you propose that could predictably counteract this tendency? I’m still waiting to hear it.
(And the error in assumptions was made by practitioners, where the incentive to produce meaningful results is much stronger, because they actually get a chance to be proven wrong by nature.)
I think the compression principle provides a pretty stark criterion. If a mathematical result can be used to achieve an improved compression rate on a standard empirical dataset, it’s a worthy contribution to the relevant science. If it can’t, then it still might be a good result, but it should be sent to a math journal, not a science journal.
I think the problem with overmathematization is that it adds prestige to theories while making them harder to check.
I’m guessing that people tend to think the opposite of over mathematization is hand waving. Perhaps you could talk about inappropriate mathematization. An example would things like the majority of artificial neural networks. Interesting maths and systems to be sure, but a million miles away from actual neurons.
How about examples from physics and chemistry in 1964. Or do you think Platt was wrong? What’s different this time?
This probably doesn’t quite count, but how about Einstein’s and Szilard’s relatively low esteem among their peers and consequentially low influence largely due to relatively low math ability prior to their major successes (and in Szilard’s case, even after his).
If Szilard had been more influential, nuclear weapons could have been developed early enough to radically change the course of WWII.
This seems like your standard physics bias. That is, if what these scientists are doing doesn’t look exactly like what physicists are doing, what they are dong isn’t really science.
Come on guys, this stuff should have died off in the 1960s. Evolutionary biology, microeconomics, and artificial intelligence cannot and should not try to be physics. The very nature of the subject matter prevents it from being so.
There’s a trivial sense in which that is correct—physics doesn’t have much in the way of cladistics, for example—but I suspect that there’s some cause to be thoughtful about this. Even in engineering, a field so ad-hoc-happy that nuclear-grade duct tape is a real product, in the one situation I know of where models without underlying mechanisms were used, the models simply failed.
There’s no requirement that mechanisms be fundamental. If I’m running hot water through a cold pipe at high enough speed, the Dittus-Boelter equation:
is a mechanism, even though it doesn’t even tell me the temperature distribution in a cross-section of the pipe. It starts with physics) and goes through empirical testing to produce a result. But it’s a reliable result, it parallels the behavior of the physical system at the level it models, and as a result I can design my car radiator or steam power plant or liquid electronic chip cooler and expect it to work.
And that’s even more like physics than anything Platt or Burfoot proposed. So I wouldn’t be so quick to dismiss these kinds of remarks as sneering.
For whatever it’s worth, high energy physics is only one of Platt’s examples—the other is molecular biology. And much of the basis for evolutionary biology has a quantitative, physics-style way of working. I’m tempted to suggest that evolutionary biology has been so much more unambiguously successful than microeconomics and AI because it has aped physics so well!
I don’t have a problem with that.
In your view, how should these be conducted instead?
I’m amazed that this paper is from ’64 and so are other very interesting ones I have been reading lately. There are so many valuable gems from the past yet they seem to be never taught and quickly forgotten.
But surely they do? Every proposal for a way of doing AI (I’m reading this as AGI here) is a hypothesis about how an AI could be created, and the proposers’ failure to create an AI is the refutation of that hypothesis. Science as normal. Talk of physics envy is just an excuse for failure. The problem with excusing failure is that it leaves you with failure, when the task is to succeed.
Every proposal for turning lead into gold is a hypothesis about how lead could be turned into gold, but this doesn’t make alchemy science. Good science progresses through small problems conclusively solved, building on each other, not by trying and repeatedly failing to reach the grand goal.
I dunno. I feel like there should be a symmetry between positive results and negative results, like well-designed but failed experiments shouldn’t lose science points just because they failed.
While I wouldn’t go so far as to say that huge number of grand designs with negative results are not science, it seems to me like they are trying to brute force the solution.
Every negative in a brute force attack only eliminates one key, and doesn’t give much information as negatives are far more numerous than positives. It is not a very efficient way to search the space, and we should try to do a lot better if we can. It is the method of last resort.
I think there is a science of intelligence which (in my opinion) is closely related to computation, biology, and production functions (in the economic sense). The difficulty is that there is much debate as to what constitutes intelligence: there aren’t any easily definable results in the field of intelligence nor are there clear definitions.
There is also the engineering side: this is to create an intelligence. The engineering is driven by a vague sense of what an AI should be, and one builds theories to construct concrete subproblems and give a framework for developing solutions.
Either way this is very different than astrophysics where one is attempting to: say, explain the motions of the heavenly sphere: which have a regularity, simplicity, and clarity to them that is lacking in any formulation of the AI problem.
I would say that AI researchers do formulate theories about how to solve particular engineering problems for AI systems, and then they test them out by programming them (hopefully). I suppose I count, and that’s certainly what I and my colleagues do. Most papers in my fields of interest (machine learning and speech recognition) usually include an “experiments” section. I think that when you know a bit more about the actually problems AI people are solving you’ll find that quite a bit of progress has been achieved since the 1960′s.
Re: there aren’t any easily definable results in the field of intelligence nor are there clear definitions.
There are pretty clear definitions: http://www.vetta.org/definitions-of-intelligence/
Yes, but I guess Marks’ problem was that there are too many clear definitions. Thus, it’s not clear which to use.
Interestingly, many unclear definitions don’t have this particular problem. Clear definitions tend don’t allow as much wiggle room to make them mutually compatible :-)
The fact that there are so many definitions and no consensus is precisely the unclarity. Shane Legg has done us all a great favor by collecting those definitions together. With that said, his definition is certainly not the standard in the field and many people still believe their separate definitions.
I think his definitions often lack an understanding of the statistical aspects of intelligence, and as such they don’t give much insight into the part of AI that I and others work on.
Interesting that you’re taking into account the economic angle. Is it related to Eric Baum’s ideas (e.g. “Manifesto for an evolutionary economics of intelligence”)?
Right, so in Kuhnian terms, AI is in a pre-paradigm phase where there is no consensus on definitions or frameworks, and so normal science cannot occur. That implies to me that people should spend much more time thinking about candidate paradigms and conceptual frameworks, and less time doing technical research that is unattached to any paradigm (or attached to a candidate paradigm that is obviously flawed).
It actually comes from Peter Norvig’s definition that AI is simply good software, a comment that Robin Hanson made: , and the general theme of Shane Legg’s definitions: which are ways of achieving particular goals.
I would also emphasize that the foundations of statistics can (and probably should) be framed in terms of decision theory (See DeGroot, “Optimal Statistical Decisions” for what I think is the best book on the topic, as a further note the decision-theoretic perspective is neither frequentist nor Bayesian: those two approaches can be understood through decision theory). The notion of an AI as being like an automated statistician captures at least the spirit of how I think about what I’m working on and this requires fundamentally economic thinking (in terms of the tradeoffs) as well as notions of utility.
Surely Peter Norvig never said that!
Go to 1:00 minute here
“Building the best possible programs” is what he says.
Ah, what he means is having an agent which will sort through the available programs—and quickly find one that efficiently does the specified task.
Excellent link—thank you!
You misquoted:
should be: