Crush Your Uncertainty
Bayesian epistemology and decision theory provide a rigorous foundation for dealing with mixed or ambiguous evidence, uncertainty, and risky decisions. You can’t always get the epistemic conditions that classical techniques like logic or maximum liklihood require, so this is seriously valuable. However, having internalized this new set of tools, it is easy to fall into the bad habit of failing to avoid situations where it is necessary to use them.
When I first saw the light of an epistemology based on probability theory, I tried to convince my father that the Bayesian answer to problems involving an unknown processes (eg. laplace’s rule of succession), was superior to the classical (eg. maximum likelihood) answer. He resisted, with the following argument:
The maximum likelihood estimator plus some measure of significance is easier to compute.
In the limit of lots of evidence, this agrees with Bayesian methods.
When you don’t have enough evidence for statistical significance, the correct course of action is to collect more evidence, not to take action based on your current knowledge.
I added conditions (eg. what if there is no more evidence and you have to make a decision now?) until he grudgingly stopped fighting the hypothetical and agreed that the Bayesian framework was superior in some situations (months later, mind you).
I now realize that he was right to fight that hypothetical, and he was right that you should prefer classical max likelihood plus significance in most situations. But of course I had to learn this the hard way.
It is not always, or even often, possible to get overwhelming evidence. Sometimes you only have visibility into one part of a system. Sometimes further tests are expensive, and you need to decide now. Sometimes the decision is clear even without further information. The advanced methods can get you through such situations, so it’s critical to know them, but that doesn’t mean you can laugh in the face of uncertainty in general.
At work, I used to do a lot of what you might call “cowboy epistemology”. I quite enjoyed drawing useful conclusions from minimal evidence and careful probability-literate analysis. Juggling multiple hypotheses and visualizing probability flows between them is just fun. This seems harmless, or even helpful, but it meant I didn’t take gathering redundant data seriously enough. I now think you should systematically and completely crush your uncertainty at all opportunities. You should not be satisfied until exactly one hypothesis has non-negligible probability.
Why? If I’m investigating a system, and even though we are not completely clear on what’s going on, the current data is enough to suggest a course of action, and value of information calculations say that decision is not likely enough to change to make further investigation worth it, why then should I go and do further investigation to pin down the details?
The first reason is the obvious one; stronger evidence can make up for human mistakes. While a lot can be said for it’s power, human brain is not a precise instrument; sometimes you’ll feel a little more confident, sometimes a little less. As you gather evidence towards a point where you feel you have enough, that random fluctuation can cause you to stop early. But this only suggests that you should have a small bias towards gathering a bit more evidence.
The second reason is that though you may be able to make the correct immediate decision, going into the future, that residual uncertainty will bite you back eventually. Eventually your habits and heuristics derived from the initial investigation will diverge from what’s actually going on. You would not expect this in a perfect reasoner; they would always use their full uncertainty in all calculations, but again, the human brain is a blunt instrument, and likes to simplify things. What was once a nuanced probability distribution like 95% X, 5% Y
might slip to just X
when you’re not quite looking, and then, 5% of the time, something comes back from the grave to haunt you.
The third reason is computational complexity. Inference with very high certainty is easy; it’s often just simple direct math or clear intuitive visualizations. With a lot of uncertainty, on the other hand, you need to do your computation once for each of all (or some sample of) probable worlds, or you need to find a shortcut (eg analytic methods), which is only sometimes possible. This is an unavoidable problem for any bounded reasoner.
For example, you simply would not be able to design chips or computer programs if you could not treat transistors as infallible logical gates, and if you really really had to do so, the first thing you would do would be to build an error-correcting base system on top of which you could treat computation as approximately deterministic.
It is possible in small problems to manage uncertainty with advanced methods (eg. Bayes), and this is very much necessary while you decide how to get more certainty, but for unavoidable computational reasons, it is not sustainable in the long term, and must be a temporary condition.
If you take the habit of crushing your uncertainty, your model of situations can be much simpler and you won’t have to deal with residual uncertainty from previous related investigations. Instead of many possible worlds and nuanced probability distributions to remember and gum up your thoughts, you can deal with simple, clear, unambiguous facts.
My previous cowboy-epistemologist self might have agreed with everything written here, but failed to really get that uncertainty is bad. Having just been empowered to deal with uncertainty properly, there was a tendency to not just be unafraid of uncertainty, but to think that it was OK, or even glorify it. What I’m trying to convey here is that that aesthetic is mistaken, and as silly as it feels to have to repeat something so elementary, uncertainty is to be avoided. More viscerally, uncertainty is uncool (unjustified confidence is even less cool, though.)
So what’s this all got to do with my father’s classical methods? I still very much recommend thinking in terms of probability theory when working on a problem; it is, after all, the best basis for epistemology that we know of, and is perfectly adequate as an intuitive framework. It’s just that it’s expensive, and in the epistemic state you really want to be in, that expense is redundant in the sense that you can just use some simpler method that converges to the Bayesian answer.
I could leave you with an overwhelming pile of examples, but I have no particular incentive to crush your uncertainty, so I’ll just remind you to treat hypotheses like zombies; always double tap.
- 3 Dec 2013 22:49 UTC; 15 points) 's comment on Reasons to believe by (
- 3 Oct 2013 15:41 UTC; 1 point) 's comment on Systematic Lucky Breaks by (
This is a good post, but I’d like to understand better what kind of person would benefit from your advice. Can you describe more concretely what kind of situations you’re dealing with?
Moved to Discussion (lack of examples played a certain part of this, but mostly reception in terms of total upvoting which I suspect also had something to do with lack of examples).
Ok. Some typical examples of risky decisions under uncertainty people have to make:
(a) Should I take a job in a new city?
(b) Should I go to graduate school or get a job out of college?
(c) Should I buy a house now (e.g. what will the housing market do in 5 years?)
(d) Should I marry this person?
Here is another problem that is more academic:
(e) How do I learn a causal graph from data?
Bayesian epistemology (or approximation thereof) vs frequentism, go! I am calling your bluff. Do you actually know what you are talking about?
Downvoted for an uncharitable interpretation of the OP.
You seem to have cherry-picked examples where collecting additional evidence appears to be either very costly or impossible. And failed at it. In nearly all your examples it is possible and advisable to collect more evidence before (or while) going all Bayesian on the problem (the first half of the OP’s point). And with enough evidence there would be little difference between Bayesian and frequentist calculations. Which is the other half of the OP’s point. You missed both halves.
Based on Nyan’s subsequent post, what he was trying to say was: “get more data,” which is a point, as he correctly points out (not in the OP though) that is orthogonal to B vs F.
Ok. I guess I was confused by this start:
“Bayesian epistemology and decision theory provide a rigorous foundation for dealing with mixed or ambiguous evidence, uncertainty, and risky decisions. You can’t always get the epistemic conditions that classical techniques like logic or maximum liklihood require, so this is seriously valuable.”
Also by the fact that the word “Bayesian” is used lots of times in the OP. I like causal graphs. It doesn’t mean I have to sprinkle them into every post I make on every subject :).
I was not trying to cherry pick examples for any particular purpose. These examples were all difficult decisions I had to make in my life, except the last one, which is an academic example where B vs F considerations are very subtle. I don’t know what it means to “go Bayesian” on these examples. What made them difficult was not the kind of thing that Bayes theorem would have made easier.
I guess my view is, unless you are doing stats/machine learning (and maybe not even then!), you ought to have no opinion on B vs F. This is an argument that will not affect your life.
Huh, I thought your examples (some of which are life-affecting) are supposed to demonstrate that there are plenty of cases where for the same limited set of data B > F, but it looks like I misunderstood your point completely. Sorry about that.
I like the thesis, but remember that decision theory is not the only difference between the frequentist and Bayesian frameworks. Crushing uncertainty is a useful prescription from frequentist epistemology, but Bayesianism and frequentism also disagree often on how uncertainty can be crushed. The Bayesian framework can account for a much wider variety of evidence (all of it, in the ideal), whereas frequentism can usually only utilize repetitive “iid” or controlled experimental data.
The real advantage of Bayesian analysis is drawing strong, high-certainty conclusions from bits of noisy data that a frequentist would just throw away. For instance, when decent priors are available, Bayesian analysis can draw very strong conclusions from very small data sets which a frequentist would dismiss as insignificant. I recently ran across a great example of this, which I will probably write up soon in a LW post.
Maybe I focused too much on the bayesianism vs frequentism thing. That’s not the point. You should use bayesian methods, just gather lots more data than locally necessary.
So yeah, sometimes that data is different types of evidence that a frequentist analysis couldn’t even correlate, but again, the point is to overdo it, not to disregard those many arguments or whatever.
Interesting responses. Given that many of them miss the point, the point about a lack of examples is well taken. It is evidently impossible to communicate without using examples.
Now that’s a hypothesis with only a bit of evidence and not much confidence, and the heuristic I’m getting at here would suggest that I really ought to consider collecting a wider sample. Maybe write 10 big comments randomly assigned to be exampleful or not exampleful, and see what actual correlations come up. Note that if I don’t do that, I would have to think about subtle distinctions and effects, and many possibilities, but if I do, there’s no room for such philosophy; the measurements would make it clear with little interpretation required.
And that forms our first example of what I’m trying to get at; when you form a hypothesis, it’s a good idea to immediately think of whether there is an experiment that could disambiguate so far that you could think of it as a simple fact, or alternatively reveal your hypothesis as wrong. This is just the virtue of empiricism, which I previously didn’t take seriously.
Maybe this is only useful to me due to my particular mental state before and after this idea, and because of the work I do. So here’s some examples of the kind of stuff I had in mind to be clear:
Suppose you are designing a zinc-air alkaline fuel cell (as I am) and you see a funny degradation in the voltage over extended time, and a probe voltage wandering around. Preliminary investigations reveal (or do they?) that it’s an ohmic (as opposed to electrochemical) effect in the current transfer parts. The only really serious hypothesis is that there is some kind of contact corrosion due to leaking. Great, we know what it is, let’s rip it apart and rebuild it with some fix for that (solder). “No” says the Crush Your Uncertainty heuristic, “kick it when it’s down, kill all other hypotheses, prove it beyond all suspicion.”
So you do; you take it apart and painstakingly measure the resistance between all points and note the peculiar distribution of resistances very much characteristic to a corrosion issue in the one particular spot. But then oh look, the resistance depends on how hard you push on it (corrosion issue), the pins are corroded, and the contact surface has caustic electrolyte in it. And then while we’re at it, we notice that the corrosion doesn’t correlate with the leaks, and is basically everywhere conditional on any leak, because the nickel current distribution mesh wicks the electrolyte all over the place if it gets anywhere. And you learn a handful of other things (for example, why there are leaks, which was incidentally revealed in the thorough analysis of the contact issue).
...And we rip it apart and rebuild it with some solder. The decision at hand didn’t change, but the information gained killed all uncertainty and enlightened us about other stuff.
So then imagine that you need lots of zinc in a particular processed form to feed your fuel cell, and the guys working that angle are debugging their system so there’s never enough. In conversation it’s revealed that you think there was 3 kg delivered, and they think they delivered 7. (For various reasons you can’t directly measure it.) That’s a pretty serious mistake. On closer analysis with better measurements next time, you both estimate ~12 kg. OK there’s actually no problem; we can move on to other things. “No” says the Crush Your Uncertainty heuristic, “kick it when it’s down.” This time you don’t listen.
...And it comes back to bite you. Turns out it was the “closer” inspection that was wrong (or was it?) Now its late in the game the CEO is looking for someone to blame, and there’s some major problem that you would have revealed earlier if you’d listened to the CYU heuristic.
There’s a bunch of others, mostly technical stuff from work. In everyday life I don’t encounter enough of these problems to really illustrate this.
Again, this isn’t really about Bayes vs Frequentism, it’s about little evidence and lots of analysis vs lots of evidence and little analysis. Basically, data beats algorithms, and you should take that seriously in any kind of investigation.
Yeah. I work as a programmer, and it took me a while to learn even if you’re smart, double-checking is so much better than guessing, in so many unexpected ways. Another lesson in the same vein is “write down everything”.
For important decisions, yes.
For the majority of matters over which I am uncertain, the costs of gathering more data exceed the utility to be gained by making the right choice.
Right. First of all, the cases I’m dealing with that prompted this (high speed iterative R&D and debugging) may be different from most circumstances.
Second, I assert that even when the utility to be gained directly by better informed decisions is low, it is still a good idea to gather more data because of out-of-decision concerns like giving yourself more contact with reality, and having simpler models/memories for future selves to remember and build on.
Sometimes intuitive cost-benefit analyses in this area fail badly due to a variety of biases.
I think of it as making decisions. If there’s “30% chance of rain”, you can’t take 30% of an umbrella. You either take one, or not, and the decision turns out to be right, or not. Uncertainty is continuous, decision is discontinuous.
Decisions are not made only at the end of reasoning out a situation, but all the way through, as you discard possbilities when they appear not to be worth further attention. For that matter, sometimes you have to discard possibilities that are worth further attention, because you can’t explore everything in order to decide which things to explore.
He makes an excellent point. I have noticed that this “pick the best model based on logic or Bayesian inference” tends to act as a curiosity stopper here on occasion. My pet peeve: “MWI clearly wins! No need to test it further.”
I hope that believing that MWI clearly wins based on our current knowledge and testing it further are not mutually exclusive. (At least not any more than believing in collapse and testing it further.) For example, it inspired the quantum bomb test.
This argument feels to me a bit strawmanish, as if a creationist would accuse evolutionists of saying: “Evolution clearly wins! No need to test it further.” Well, this specific question seems settled, but it never meant that the whole biology research is over.
Just because something clearly wins today, that does not prevent people from exploring the details… and possibly come to an opposite conclusion later. At some moment scientists believed that Newtonian physics was correct, and it did not prevent them to discover relativity later.
In exactly the same way it is possible to believe in MWI and start constructing e.g. a quantum supercomputer. If MWI is wrong, it’s likely that during the construction of the supercomputer we will find the counter-evidence (e.g. encounter a situation where the hypothetical collapse really happens and the parallel branches stop interfering, which would make constructing quantum supercomputers with more than 42 bits impossible, or something like this). On the other hand, if we invent quantum supercomputer, quantum teleporter and quantum dishwasher without disproving MWI, I guess that would just make MWI more likely.
I agree that saying “I can make a Bayesian estimate based on little data” should not stop us from finding more data. On the other hand, there is always more data possible, but that should not prevent us from making temporary conclusions from the data we already have. I mean, why should we expect that the new data will move our estimate in a specific direction?
I agree that the two are not mutually exclusive. But strongly believing something makes you less likely to want to test it.After all, what’s the point if it’s “clearly true”? The expected surprise value is low, since the prior probability of encountering contrary evidence is low. This is fine for experimentally preferred models, like, say, energy conservation, so there is really no point to test it further by trying to construct various perpetual motion machines. However, if preference of one model over another is based solely on inference and not on experiment, believing this preference tends to act as a curiosity stopper, which is a bad thing, since it’s Nature that is the ultimate arbiter between competing models, not faulty and biased human reasoning.
Re quantum computers/quantum bomb: both collapse and MWI lead to exactly the same predictions. I am yet to see anyone constructing an experiment unambiguously discriminating between the interpretations. (There was this handwaving by Deutsch about conscious reversible quantum computing based on some unspecified future technology, but few unbiased physicists take it seriously.)
Strongly believing in Catholic Trinity may make people less likely to test it. But strongly believing in electricity didn’t stop people from making thousands of electrical gadgets, each of them implicitly testing the correctness of the underlying theory.
The value of the electric gadgets is not in testing our knowledge of electricity, but they do it anyway. The value of GPS is not in testing the theory of relativity, but it confirms it anyway. Maybe one day quantum computers will be built for commercial reasons...
If the collapse interpretation is correct, it should be possible to prove it by designing an experiment which demonstrates the collapse, so the proponents of collapse theory should be interested in doing so. If the MWI interpretation is correct, then… well, the collapse theory is unfalsifiable—you can make a million experiments where the collapse didn’t happen, but that does not prove that it doesn’t happen when you are not looking.
So the main difference is that the proponents of MWI can do their experiments and predict the outcome according to their model of the world, while the proponents of collapse must always think “I believe there is a dragon in my garage, but it will have no influence on the specific outcome of this experiment”. Or less metaphorically: “I believe that the collapse exists, but I also believe it will never influence the outcome of any of my experiments”. It is still possible to do a high-quality research using this belief, just like it is possible to do a high-quality research while being religious.
EDIT: The critical thing is that if someone believes there is absolutely no difference between collapse and no collapse, then what the hell do they actually mean when they say “collapse”? (In real life, did you ever go to a shop to buy a bread which was absolutely experimentally indistinguishable from no bread?) The only way I could interpret it meaningfully is that “collapse” means “the interference between branches becomes too small to measure by our instruments”. But that doesn’t feel like a fact about the territory.
Note that swapping the terms collapse and MWI (“world splitting”, to make it more concrete) in the above does not affect the [in]validity of this statement. This might give you a hint. Nah, who am I kidding.
So, what exactly does the word “collapse” mean to you?
Do you believe there is a specific moment when the collapse happens? Could you describe an experimental difference between the collapse happening at the time t or ten minutes later?
Or does the collapse happen gradually, like at some moment we have 10% of the collapse and at some later moment we have 90% of the collapse? What experimental observation would mean that the collapse is at 50%?
(My assumption was that the collapse happens in some completely unspecified moment after the interference between different branches becomes too small to measure by our instruments. Which pattern-matches to “when we can’t observe X, it suddenly changes to Y, because we believe in Y and not in X. But if my definition of the collapse is wrong, please tell me the correct one so I can stop making strawman arguments.)
I have outlined by views on this several times on this forum. I do not favor one interpretation over another based solely on logic. I am very skeptical of any kind of “objective collapse” for the same reasons Eliezer is, because in the EPR setup (singlet decay) it would require some sort of correlation between spacelike-separated processes. But stranger things have happened to physics, so who knows. For the same reason I am also skeptical of the naive MWI, since the two pairs of split worlds would have to “recombine” in just the right way to produce only two worlds when the measurements are compared. I am slightly partial to Rovelli’s Relational QM (Eliezer hates it), because it provides an explicit ontology matching the shut-up-and-calculate non-interpretation (unitary evolution + Born rule), without postulating any invisible processes, like collapse or world-splitting. I think that Bohmian mechanics is extremely unlikely to reflect the “reality” (i.e. the potential future model one level deeper than the present-day QM), if only because it’s so clunky.
As I said before, we have no good model of how a single observed eigenstate emerges during an irreversible multi-state interaction (known as the measurement), and this biggest mystery in all of quantum physics deserves more research. The magic incantation “MWI” is a curiosity-stopping spell, which might be interesting in the HPMoR universe, but has no value in the one we live in.
Thanks for the answer! I wish someone would explain to me RQM in simple terms, then I could make an opinion about it.
I know this probably doesn’t sound good, but I remember reading in Feynman’s book how when he was talking to mathematicians about complex mathematical things, he tried to imagine a specific object, like a ball, and then apply what they said to that specific object; and if didn’t make sense, he objected. I am essentially trying to do something similar, but with the concepts of physics. So I’d like to hear a story about what happens with the ball in the MWI multiverse, and what happens in the RQM multiverse. (I am not asking you specifically do to that, I just express my wish.)
This is exactly the failure mode I was targeting with this idea. Thank you for stating it so well.
If you think it’s a good idea to do a test when the VoI calculation has a negative EV, either you’re doing the VoI calculation wrong, or you’re mistaken about it being a good idea. I think another way to look at this is “if the VoI calculation says the value of the test is small, that means you should still do it if the cost of the test is small.”
For example, I have the habit of trying to report two pieces of information about dates- “Saturday the 20th,” for example, and when I see a date like that I pull out a calendar to check. (Turns out the 20th is a Sunday this month.) This makes it easier for me to quickly catch others’ mistakes, and for others to quickly catch my mistakes. Building that sort of redundancy into a system seems like a good idea, and I think the VoI numbers agree.
One of us may have dropped the “naive” modifier from the VOI. I meant that doing a straightforward VOI the way you would have learned to by eg implementing CFAR’s advice, would not automatically include the value of simpler models and self-calibration and other general-utility effects that are negligible in the current decision, but significant when added up outside the current context.
Also, good example of the value of redundant information. I agree that a well done VOI should catch that, but I assert that this has to be learned.
Hm. I’m not sure I agree with this, but that’s partly because I’m not sure exactly what you’re saying. A charitable read is that there’s reason to expect overconfidence when considering each situation individually and more correct confidence when considering all situations together because of a psychological quirk. An uncharitable read is that you can have a recurring situation where Policy A chooses an option with negative EV every time, and Policy B chooses an option with positive EV every time, but Policy A has a total higher EV than Policy B. (This can only happen with weird dependencies between the variables and naive EV calculations.)
I do agree that the way to urgify information-seeking and confusion-reducing actions and goals is to have a self-image of someone who gets things right, and to value precision as well as just calibration, and that this is probably more effective in shifting behavior and making implicit VoI calculations come out correctly.
I certainly learned it the hard way!
I think this is an outside view/inside view distinction. By “straightforward VOI” I think we’re talking about an inside view VOI. So the thesis here could be restated as “outside view VOI is usually higher than inside view VOI, especially in situations with lots of uncertainty.”
EDIT: Now that I’m thinking about it, I bet that could be formalized.
An alternative to “Crush Your Uncertainty” is “Maximize Your Freedom”. Uncertainty is not so much of a problem if you can live with either outcome.
This is not intended as a recommendation for freedom. Just puts this into a wider perspective. The need to crush your uncertainty is larger if the price of a misjudgement is higher.
The rule thus rather is: “Crush your uncertainty times cost”. And I bet there are game theoretical results about that.
To paraphrase Keynes: asymptotically, all methods work.
Looking for ways to take more data is a good idea. Pretending you can always crush your uncertainty with available data is not.
Bayesian analysis is expensive? Compared to what?
You know what’s more expensive? Trying something else. Bayesian analysis practically writes itself (I use Jaynes’ notation). I break it out when I’m just trying to clarify my model of a situation. I find it so much easier than the sea of special case techniques and jargon in classical stat.
Well… It depends on how much it would cost to collect more evidence!
That’s not a realistic goal for most non-trivial real-life situations.
I have a nasty suspicion that trying to “crush the uncertainty” will lead to overconfidence bias. Mentioning Nassim Taleb’s black swans is also relevant here.
Here we run into the difference between threshold goals and directional goals. The above was a directional goal.
Hence the very small lip service paid with “unjustified overconfidence is even less cool”. Of course that could use some expansion.. Maybe in a different post.