[SEQ RERUN] The Dilemma: Science or Bayes?
Today’s post, The Dilemma: Science or Bayes? was originally published on 13 May 2008. A summary (taken from the LW wiki):
The failure of first-half-of-20th-century-physics was not due to straying from the scientific method. Science and rationality—that is, Science and Bayesianism—aren’t the same thing, and sometimes they give different answers.
Discuss the post here (rather than in the comments to the original post).
This post is part of the Rerunning the Sequences series, where we’ll be going through Eliezer Yudkowsky’s old posts in order so that people who are interested can (re-)read and discuss them. The previous post was The Failures of Eld Science, and you can use the sequence_reruns tag or rss feed to follow the rest of the series.
Sequence reruns are a community-driven effort. You can participate by re-reading the sequence post, discussing it here, posting the next day’s sequence reruns post, or summarizing forthcoming articles on the wiki. Go here for more details, or to have meta discussions about the Rerunning the Sequences series.
What makes “science vs. bayes” a dichotomy? The scientific method is just a special case of Bayesian reasoning. I mean, I understand the point of the article, but it seems like it’s way less of a dilemma in practice.
It’s a dichotomy in this specific case where science says “don’t care, same math, same predictions” and EY’s Bayes says “my model is simpler than yours, so it’s better”. The dichotomy disappears once the models are different experimentally, except that one should still strive to find the Kolmogorov-simplest model with the same predictive power. In any case, EY’s point, the way I understood it, is that when the scientific method fails (different models are not easily testable, like in economics, for example), one should “fall back” on Bayes.
It is foolish to strive to find the Kolmogorov-simplest model, because that task is known to be impossible.
Fallacy of Gray.
The Fallacy of Gray is a good post, but my comment is not an instance of such a fallacy.
The point he was trying to make is that it is not foolish to strive for a Kolmogorov-simpler model.
If always finding perfection is impossible, striving to find it and moving closer to it at every opportunity isn’t.
“Striving to find it” and “moving closer to it at every opportunity” can be very different things.
When the “perfection” in question is something that you know is impossible to achieve (and in any given nontrivial case, you know you’ll be unable to establish you’ve achieved it even if by chance you did), establishing it as your goal—which is what “striving to find it” is—is foolish.
On the other hand, finding simpler models certainly is a good idea. But it’s good not because it gets us “closer at every opportunity” to the Kolmogorov-simplest model, for two reasons. One is stated in the parentheses above, and the second is that “closer” is almost meaningless, when you know that you not only cannot compute K, you can’t in general put upper bounds on it either (by Chaitin’s Incompleteness), which means that you have no idea how closer you’re getting to the ideal value with every particular simplification.
Well, I’m not buying K-complexity goal in particular, which is why I said only “perfection”; I’m making a different point. The thing about goals is that they are not up for grabs, they can’t in themselves be foolish, only actions or subgoals can be foolish. Foolishness must follow from incongruity with some higher goal (that cares nothing for efficiency or probability of success, mere instrumental drives), so if one’s goal is to optimize some hard-to-optimize quality whose level of optimality is also hard to gauge, that’s still what one should do, if only by taking hard-to-arrange accidental opportunities for improvement.
I guess we’re talking past each other then, because I (plausibly, I think, given the context) took your original reply to still refer to the Kolmogorov complexity goal. My beef is with that particular formulation, because I find it sometimes to be illegitimately overused for (what amounts to merely) emotional effect. I’m all for working on optimizing imperfectly-defined, hard-to-pin-down goals! Been doing that for a while with my life. (the results are mixed)
The hypothetical still applies, I think. Suppose minimizing K-complexity happened to be one’s goal, then there are probably some steps that can be taken in its pursuit, and in any case it wouldn’t be right to call it “foolish” if it’s indeed the goal, even in the unlikely situation where nothing whatsoever can predictably advance it (maybe one should embark on a quest to find a Turing oracle or something). It might be foolish to think that it’s a human (sub)goal though, where it would clash with what we actually seek.
Is it known what is the highest complexity, beyond which the Chaitin’s Incompleteness applies? If it is relatively large, it is possible that all hypotheses interesting for humans have complexity lower than that...
In particular I wish to extract from that paper the following very simple (albeit non-constructive) proof of Chaitin’s Incompleteness:
Given a (reasonable, sound) formal theory F, we know that F cannot prove all true sentences of the form “Program P never halts” (the reason is that if it could, we could solve the halting problem by searching over all possible proofs in F for the proof of P either halting with a particular run, or never halting, being sure our search will finish in finite time). Consider the shortest program P such that P never halts but F cannot prove that fact. Let L(P) be its length. Claim: F can never prove that Kolmogorov complexity of anything can be greater than L(P). Proof: given any output X, F can never refute the possibility that P might yet halt at some future time and output exactly X. Therefore L(P) must remain a candidate for the Kolmogorov complexity of X as far as F is concerned.
Edit: nevermind this. I’ve realized the proof is wrong. It’s only true that “F can never refute the possibility that P might yet halt at some time and output some Y”, but it is not true that “F can never refute the possibility that P might yet halt and output this specific X”. It’s conceivable (albeit unusual) that P doesn’t halt, F is unable to prove that, but is able to prove that should P halt, its output will not be X. For example, think of P as a Turing machine with one halt state, which is easily “backwards-traceable” to a sequence of actions that erases the entire tape so far and writes out “123″. Then F can easily be strong enough to be able to prove that if P halts at all, it outputs “123” and not anything else.
I emailed the article’s author and he replied acknowledging the problem, which has been raised by a bunch of people before, and giving me links to a few paywalled articles with the correct exposition. However, this correct exposition is nowhere as succint and attractive as the short and faulty proof above.
LOL
I don’t know much about it, but searching leads to this pretty interesting-looking paper which argues that the bound is largely incidental: http://www.mv.helsinki.fi/home/praatika/chaitinJPL.pdf
Thanks! The paper does give an answer, obvious in hindsight: the threshold constant for a formal system F is determined by the shortest Turing Machine that does not halt, but that fact is not provable in F.
Unfortunately, this turns out to be subtly wrong—see my update in a sibling comment.
That’s a pity. Still, given any non-halting TM for which F cannot prove that, it is easy (costs very little additional constant complexity) to build a TM for which F also cannot prove that it does not return any x. And this still answers my original question (whether the threshold is very high) in the negative. So, bad for K-complexity, we need something better.
I was simply trying to interpret what EY wrote. I should have said something like “prefer a K-simpler model”. Personally, I do not support expending much effort looking for a simpler model with the same predictive power, except as a step to finding a model with better predictive power.
There are occasional examples of one thing being strictly simpler than another. For example, “lightning is thrown by Thor, and also Maxwell’s equations, Coulomb’s Law, and the atomic theory of matter are true” is simpler if you just cut out Thor. So you should cut out Thor. So you should at least strive to that extent :P
Agreed that striving to find simpler theories is, generally speaking, a worthy goal. What I tried to emphasize is that striving to find the simplest one—in the particular Kolmogorov sense—isn’t.
The scientific method is inaccurate, it doesn’t use evidence the best way it could be used. In practice, for a long time science has worked good enough often enough, but you can do better.
Science is simple enough that you can sic a bunch of people on a problem with a crib sheet and an “I can do science, me” attitude, and get a good enough answer early. The mental toolkit for applying Bayes is harder to give to people. I am right at the beggining approaching from a mentally lazy, slight psychological, and engineering background, when I first saw the word Bayes was in a certain Harry Potter fanfic a week or so ago. I failed the insightful tests in the early sequences, and caught myself noticing I was confused and not doing anything about it, and failed all over again in the next set of insightful tests. I have a way to go.
The time it takes for me to get a “I can do Bayes, me” attitude, even with a crib sheet, could have been spent solving a bunch of other problems.
If the choice is between science and Bayes, which at my low level of training I suspect is a false choice, then at the moment I would go science because I am better at it than Bayes. Like I type Qwerty not Dvorak, because I can type faster Qwerty even though Dvorak is (allegedly) better.
Given that each person at the moment has finite problem solving time, an argument could be made for applying Science to problems as it is easier to teach. That being said “I notice that I am confused” would have saved me a lot of trouble if I had heard of it earlier.
Sure.
More generally, if I don’t want to optimize X, but merely want to satisfy some threshold T for X, then I don’t really care what the optimal way of doing X is in general, I care what way of doing X gets me across T most cheaply. If getting across T using process P1 costs effort E1 from where I am now, and P2 costs E2, and E2 > E1, and I don’t care about anything else, I should choose E1.
The catch is, like a lot of humans, I also have a tendency to overestimate both the effectiveness of whatever I’m used to doing and the costs of changing to something else. So it’s very easy for me to dismiss P2 on the grounds of an argument like the above even in situations where E1 > E2, or where it turns out that I do care about other things, or both.
There are some techniques that help with countering that tendency. For example, it sometimes helps to ask myself from time to time whether, if I were starting from scratch, I would choose P1 or P2. (E.g. “if I were learning to type for the first time, would I learn Dvorak or Qwerty?”). Asking myself that question lets me at least consider which process I think is superior for my purposes, even if I subsequently turn around and ignore that judgment due to status-quo bias.
That isn’t great, but it’s better than failing to consider which process I think is better.
Well said. In considering your response I notice that a process P as part of its cost E has room to include the cost of learning the process if necessary, something that was concerning me.
I am now considering a more complicated case.
You are in a team of people of which you are not the team leader. Some of the team are scientists, some are magical thinkers, you are the only Bayesian.
Given an arbitrary task which can be better optimised using Bayesian thinking, is there a way of applying a “Bayes patch” to the work of your teammates so that they can benefit from the fruits of your Bayseian thinking without knowing it themselves?
I suppose I am trying to ask how easily or well is Bayes applied to undirected work by non-Bayesian operators. If I was a scientist in a group full of magical thinkers all of us with a task, I do not know what they would come up with but I reckon I would be able to make some scientific use of the information they generate, is the same the case for Bayes?
I expect it depends rather a lot on the nature of the problem, and on just what exactly we mean by “science,” “magical thinking,” and “Bayes”.
I find, thinking about your question, that I’m not really sure what you mean by these terms. Can you give me a more concrete example of what you have in mind? That is, OK, there’s a team comprising A, B, and C. What would lead me to conclude that A is a “magical thinker”, B is a “Bayesian,” and C is a “scientist”?
For my own part, I would say that the primary difference has to do with how evidence is evaluated.
For example, I would expect A, in practice, to examine the evidence holistically and arrive at intuitive conclusions about it, whereas I would expect B and C to examine the evidence more systematically. In a situation where the reality is highly intuitive, I would therefore expect A to arrive at the correct conclusion with confidence quickly, and B and C to confirm it eventually. In a situation where the reality is highly counterintuitive, I would expect A to arrive at the wrong conclusion with confidence quickly, while B and C become (correctly) confused.
For example, I would expect B and C, in practice, to try and set up experimental conditions under which all observable factors but two (F1 and F2) are held fixed, and F1 is varied and F2 measured and correlations between F1 and F2 calculated. In a situation where such conditions can be set up, and strong correlations are observed between certain factors, I would expect C to arrive at correct conclusions about causal links with confidence slowly, and B to confirm them even more slowly. In a situation where such conditions cannot be set up, or where no strong correlations are observed between evaluated factors, I would expect C to arrive at no positive conclusions about causal links, and B to arrive at weak positive conclusions about causal links.
Are these expectations consistent with what you mean by the terms?
I agree with the terms, for the sake of explanation by magical thinker I was thinking along the lines of young non science trained children, or people who have either no knowledge of or no interest in the scientific method. Ancient Greek philosophers could come under this label if they never experimented to test their ideas. The essence is that they theorise without testing their theory.
In terms of the task, my first idea was the marshmallow test from a Ted lecture, “make the highest tower you can that will support a marshmallow on top from dry spaghetti, a yard of string, and a yard of tape.”
Essentially a situation where the results are clearly comparable, but the way to get the best result is hard to prove. So far triangles are the way to go, but there may be a better way that nobody has tried yet. If the task has a time limit, is it worth using scientific or bayesian principles to design the tower or is it better to just start taping some pasta.
At the risk of repeating myself… it depends on the properties of the marshmallow-test task. If it is such that my intuitions about it predict the actual task pretty well, then I should just start taping pasta. If it is such that my intuitions about it predict the task poorly, I might do better to study the system… although if there’s a time limit, that might not be a good idea either, depending on the time limit and how quickly I study.