Swimming in Reasons
To a rationalist, certain phrases smell bad. Rotten. A bit fishy. It’s not that they’re actively dangerous, or that they don’t occur when all is well; but they’re relatively prone to emerging from certain kinds of thought processes that have gone bad.
One such phrase is for many reasons. For example, many reasons all saying you should eat some food, or vote for some candidate.
To see why, let’s first recapitulate how rational updating works. Beliefs (in the sense of probabilities for propositions) ought to bob around in the stream of evidence as a random walk without trend. When, in contrast, you can see a belief try to swim somewhere, right under your nose, that’s fishy. (Rotten fish don’t really swim, so here the analogy breaks down. Sorry.) As a Less Wrong reader, you’re smarter than a fish. If the fish is going where it’s going in order to flee some past error, you can jump ahead of it. If the fish is itself in error, you can refuse to follow. The mathematical formulation of these claims is clearer than the ichthyological formulation, and can be found under conservation of expected evidence.
More generally, according to the law of iterated expectations, it’s not just your probabilities that should be free of trends, but your expectation of any variable. Conservation of expected evidence is just the special case where a variable can be 1 (if some proposition is true) or 0 (if it’s false); the expectation of such a variable is just the probability that the proposition is true.
So let’s look at the case where the variable you’re estimating is an action’s utility. We’ll define a reason to take the action as any info that raises your expectation, and the strength of the reason as the amount by which it does so. The strength of the next reason, conditional on all previous reasons, should be distributed with expectation zero.
Maybe the distribution of reasons is symmetrical: for example, if somehow you know all reasons are equally strong in absolute value, reasons for and against must be equally common, or they’d cause a predictable trend. Under this assumption, the number of reasons in favor will follow a binomial distribution with p=.5. Mostly, the values here will not be too extreme, especially for large numbers of reasons. When there are ten reasons in favor, there are usually at least a few against.
But what if that doesn’t happen? What if ten pieces of info in a row all favor the action you’re considering?
One possibility is you witnessed a one in a thousand coincidence. But let’s not dwell on that. Nobody cares about your antics in such a tiny slice of possible worlds.
Another possibility is the process generating new reasons conditional on old reasons, while unbiased, is not in fact symmetrical: it’s skewed. That is to say, it will mostly give a weak reason in one direction, and in rare cases give a strong reason in the other direction.
This happens naturally when you’re considering many reasons for a belief, or when there’s some fact relevant to an action that you’re already pretty sure about, but that you’re continuing to investigate. Further evidence will usually bump a high-probability belief up toward 1, because the belief is probably true; but when it’s bumped down it’s bumped far down. The fact that the sun rose on June 3rd 1978 and the fact that the sun rose on February 16th 1860 are both evidence that the sun will rise in the future. Each of the many pieces of evidence like this, taken individually, argues weakly against using Aztec-style human sacrifice to prevent dawn fail. (If the sun ever failed to rise, that would be a much stronger reason the other way, so you’re iterated-expectations-OK.) If your “many reasons” are of this kind, you can stop worrying.
Or maybe there’s one common factor that causes many weak reasons. Maybe you have a hundred legitimate reasons for not hiring someone as a PR person, including that he smashes furniture, howls at the moon, and strangles kittens, all of which make a bad impression. If so, you can legitimately summarize your reason not to hire him as, “because he’s nuts”. Upon realizing this, you can again stop worrying (at least about your own sanity).
Note that in the previous two cases, if you fail to fully take into account all the implications — for example, that a person insane in one way may be insane in other ways — then it may even seem like there are many reasons in one direction and none of them are weak.
The last possibility is the scariest one: you may be one of the fish people. You may be selectively looking for reasons in a particular direction, so you’ll end up in the same place no matter what. Maybe there’s some sort of confirmation bias or halo effect going on.
So in sum, when your brain speaks of “many reasons” almost all going the same way, grab, shake, and strangle it. It may just barf up a better, more compressed way of seeing the world, or confess to ulterior motives.
(Thanks to Steve Rayhawk, Beth Larsen, and Justin Shovelain for comments.)
(Clarification in response to comments: I agree that skewed distributions are the typical case when you’re counting pieces of evidence for a belief; the case with the rising sun was meant to cover that, but the post should have been clearer about this point. The symmetrical distribution assumption was meant to apply more to, say, many different good features of a car, or many different good consequences of a policy, where the skew doesn’t naturally occur. Note here the difference between the strength of a reason to do something in the sense of how much it bumps up the expected utility, and the increase in probability the reason causes for the proposition that it’s best to do that thing, which gets weaker and weaker the more your estimate of the utility is already higher than the alternatives. I said “confirmation bias or halo effect”, but halo effect (preferentially seeing good features of something you already like) is more to the point here than confirmation bias (preferentially seeing evidence for a proposition you already believe), though many reasons in the same direction can point to the latter also. I’ve tried to incorporate some of this in the post text.)
Oh, good. That clarification makes things much better.
But there are still situations where you can believe a normative “for many reasons”. The most important is probably that you can have many reasons to support something as well as to oppose it. For example, there may be five strong arguments to vote Demopublican, and only four equally strong arguments to vote Republicrat. In this case, if you ask me why I’m voting Demopublican, I’d have to say “for many reasons”. Might sound like splitting hairs, but I think lots of real world cases fall into this category.
Another caveat is that this only applies to “randomly generated” alternatives. If I had to choose between a battle plan devised by a brilliant general, versus a battle plan devised by a moron, there would probably be “many reasons” to prefer the general’s, and I would keep finding more the harder I looked. All of these would come back to one reason—“it was designed by a brilliant general”, but the reason would be a fact about the plan that’s inaccessible to anyone who doesn’t know how the plan was designed, not a fact within the plan that could be discovered by looking it over. Again, may sound like splitting hairs, but a lot of real human issues could fall into this category, like using scientific versus alternative medicine.
Another possibility is that you made an error in setting your prior expectation. If you commit the planning fallacy, for instance, then it’s not surprising if new pieces of info keep suggesting that the task will take longer than you expected.
Darwin believed in the origin of species due to natural selection for many reasons, despite not having a theory of discrete inheritance or a credible explanation of how deep time was entropically possible. He believed in natural selection for one good reason, but not that it was the source of Earth’s varied species.
This seems highly related to Policy Debates Should Not Appear One-Sided, which describes the distributions of pros and cons you should expect when considering the utility of an action.
In the case when you are considering a proposition about the state of the universe, the “one common factor that causes many weak reasons” could be that the proposition is really true.
Agreed—true propositions (or good actions) typically have more pro reasons than con reasons, and false propositions (or bad actions) typically have more con reasons. So when you start out with a high probability you should expect to mostly discover more pro reasons (although in a fraction of cases where it is actually false you will instead find a large number of con reasons). When you assign a 50% chance to a proposition, you should still expect the new information to cluster in the same direction, you just don’t know which direction that will be. So 10 straight pro reasons isn’t really a 1/1024 event since they’re aren’t independent.
Skewed distributions also seem like the typical case. If your prior probability is not 50% then one tail of the distribution for changes in probability will be longer than the other, so the distribution will be skewed (especially since it must balance out the long tail with probability mass on the other side to obey conservation of expected evidence). The natural units for strength of evidence are log-odds, rather than probability, which gives you another way to think about why the changes in probability are skewed when you don’t start at 50% (the same increase in log-odds gives a smaller probability increase at high probabilities).
Part of the trick in avoiding confirmation bias (and the like) is figuring out which reasons should be independent. I’ll second JGWeissman’s recommendation of Eliezer’s post for its discussion of that issue.
I agree and have added a clarification at the bottom of the post.
I’d recommend rewriting the post to make it clearer. You go back and forth between beliefs and actions in the post, without always making it clear whether you’re describing similarities or differences between them, and you give examples for your exceptions but not for your general principle (until the clarification).
I added some more caveats in the text.
Cars are both designed and selected to have lots of good features, and have complexity to accommodate this, so it’s not surprising that they do. But to support your point, there’s a limit to how many features can be optimised with a limited amount of complexity. Beyond that, there will be compromises.
Also, the appropriate comparison is probably against another similarly optimised car.
Most of this can also apply to policies, except there’s a wider variation in complexity.
Note the nice property that martingales (random processes with E[X_{t+1}|x_t] = x_t) acting on bounded space (such as probabilities between 0 and 1) almost surely converge (that is lim t → infinity X_t exists).
Unless you have actually used evidence to create a belief. If previous evidence actually supports a belief, unless you believe the real world is random, further evidence is more likely to support the belief than not.
Those are not “many reasons”; that’s one reason repeated many times.
That turns out not to be the case, though the reason why can initially seem unintuitive. If you have fully used all the information from piece of evidence A, that will include the fact that correlated piece of evidence B will be more likely to come up. This means that B will sway your beliefs less, because it is not a surprise. Contrariwise, anticorrelated piece of evidence C will be less likely to come up, and hence be more of a surprise if it does, and move your beliefs further. Averaging over all possible new pieces of evidence, and how likely they are, it has to be wash—if it’s not a wash, then you should have already updated to the point that would be your average expected update.
(Note that for something like parameter estimation, where rather than a single belief, you use probability densities, each point will on average stay the same for any new piece of evidence, but which parts of the density go up and which go down, and by how much are highly correlated.)
By “belief”, grandparent means probability of a hypothesis, which does bob around without trend in a perfect Bayesian reasoner.
The impression I get of the difference here between a “belief” and a “hypothesis” is something like this:
I have the belief that the sun will continue to rise for a long long time.
This is probably “true.”
I have the hypothesis that the sun will rise tomorrow morning with probability .999999
Conservation of expected evidence requires that in pure Bayesian fashion, if it does rise tomorrow my probability will rise to .9999991 and if it doesn’t it will shoot down to .3 or something in a way that makes the view of all possible shifts a random walk.
That is, if your hypothesis is “true” you have great confidence, if it is true too often you are underconfident and the hypothesis has an issue.
I like the fish metaphor for conservation of expected evidence, but I don’t think the “for many reasons” phrase fits well with it. That phrase usually doesn’t involve the dynamic component of repeated updating. The reasons are just all there, they don’t come sequentially over time.
Really good metaphor. I will surely borrow it.
If a supposedly-rotten fish is swimming, though, that’s certainly reason to be suspicious.