Are you sure you’re not just worried about poor calibration?
No, my objection is fundamental. I provide a brief explanation in the comment I linked to, but I’ll restate it here briefly.
The problem is that the algorithms that your brain uses to perform common-sense reasoning are not transparent to your conscious mind, which has access only to their final output. This output does not provide a numerical probability estimate, but only a rough and vague feeling of certainty. Yet in most situations, the output of your common sense is all you have. There are very few interesting things you can reason about by performing mathematically rigorous probability calculations (and even when you can, you still have to use common sense to establish the correspondence between the mathematical model and reality).
Therefore, there are only two ways in which you can arrive at a numerical probability estimate for a common-sense belief:
Translate your vague feeling of certainly into a number in some arbitrary manner. This however makes the number a mere figure of speech, which adds absolutely nothing over the usual human vague expressions for different levels of certainty.
Perform some probability calculation, which however has nothing to do with how your brain actually arrived at your common-sense conclusion, and then assign the probability number produced by the former to the latter. This is clearly fallacious.
Honestly, all this seems entirely obvious to me. I would be curious to see which points in the above reasoning are supposed to be even controversial, let alone outright false.
Translate your vague feeling of certainly into a number in some arbitrary manner. This however makes this number a mere figure of speech, which adds absolutely nothing over the usual human vague expressions for different levels of certainty.
Disagree here. Numbers get people to convey more information about their beliefs. It doesn’t matter whether you actually use numbers, or do something similar (and equivalent) like systematize the use of vague expressions. I’d be just as happy if people used a “five-star” system, or even in many cases if they just compared the belief in question to other beliefs used as reference-points.
Perform some probability calculation, which however has nothing to do with how your brain actually arrived at your common-sense conclusion, and then assign the probability number produced by the former to the latter. This is clearly fallacious.
Disagree here also. The probability calculation you present should represent your brain’s reasoning, as revealed by introspection. This is not a perfect process, and may be subject to later refinement. But it is definitely meaningful.
For example, consider my current probability estimate of 10^(-3) that Amanda Knox killed her roommate. On my current analysis, this is obtained as follows: I start with a prior of 10^(-4) (from a general homicide rate of about 10^(-3), plus reasoning that Knox is demographically an order of magnitude less likely to kill than the typical person; the figure happens to match my intuitive sense that I’d have to meet about 10,000 similar people before I’d have any fear for my life). Then all the evidence in the case raises the probability by about an order of magnitude at most, yielding 10^(-3).
Now, this is just a rough order-of-magnitude argument. But it’s already much more meaningful and useful than my just saying “I don’t think she did it”. It provides a way of breaking down the reasoning, so that points of disagreement can be precisely identified in an efficient manner. (If you happened to disagree, the next step would be to say something like “but surely evidence X alone raises the odds by more than a factor of ten”, and then we’d iterate the process specifically on X rather than the original proposition.)
It’s a very useful technique for keeping debates informative, and preventing them from turning into (pure) status signaling contests.
Numbers get people to convey more information about their beliefs. It doesn’t matter whether you actually use numbers, or do something similar (and equivalent) like systematize the use of vague expressions. I’d be just as happy if people used a “five-star” system, or even in many cases if they just compared the belief in question to other beliefs used as reference-points.
If I understand correctly, you’re saying that talking about numbers rather than the usual verbal expressions of certainty prompts people to be more careful and re-examine their reasoning more strictly. This may be true sometimes, but on the other hand, numbers also tend to give a false feeling of accuracy and rigor where there is none. One of the usual symptoms (and, in turn, catalysts) of pseudoscience is the use of numbers with spurious precision and without rigorous justification.
In any case, you seem to concede that these numbers ultimately don’t convey any more information than various vague verbal expressions of confidence. If you want to make the latter more systematic and clear, I have no problem with that, but I see no way to turn them into actual numbers without introducing spurious precision.
The probability calculation you present should represent your brain’s reasoning, as revealed by introspection. This is not a perfect process, and may be subject to later refinement. But it is definitely meaningful.
Trouble is, this is often not possible. Most of what happens in your brain is not amenable to introspection, and you cannot devise a probability calculation that will capture all the important things that happen there. Take your own example:
For example, consider my current probability estimate of 10^(-3) that Amanda Knox killed her roommate. On my current analysis, this is obtained as follows: I start with a prior of 10^(-4) (from a general homicide rate of about 10^(-3), plus reasoning that Knox is demographically an order of magnitude less likely to kill than the typical person; the figure happens to match my intuitive sense that I’d have to meet about 10,000 similar people before I’d have any fear for my life). Then all the evidence in the case raises the probability by about an order of magnitude at most, yielding 10^(-3).
See, this is where, in my opinion, you’re introducing spurious numerical claims that are at best unnecessary and at worst outright misleading.
First you note that murderers are extremely rare, and that AK is a sort of person especially unlikely to be one. OK, say you can justify these numbers by looking at crime statistics. Then you perform a complex common-sense evaluation of the evidence, and your brain tells you that on the whole it’s weak, so it’s highly unlikely that AK killed the victim. So far, so good. But then you insist on turning this feeling of near-certainty about AK’s innocence into a number, and you end up making a quantitative claim that has no justification at all. You say:
Now, this is just a rough order-of-magnitude argument. But it’s already much more meaningful and useful than my just saying “I don’t think she did it”.
I strongly disagree. Neither is this number you came up with any more meaningful than the simple plain statement “I think it’s highly unlikely she did it,” nor does it offer any additional practical benefit. On the contrary, it suggests that you can actually make a mathematically rigorous case that the number is within some well-defined limits. (Which you do disclaim, but which is easy to forget.)
Even worse, your claims suggest that while your numerical estimates may be off by an order of magnitude or so, the model you’re applying to the problem captures reality well enough that it’s only necessary to plug in accurate probability estimates. But how do you know that the model is correct in the first place? Your numbers are ultimately based on an entirely non-mathematical application of common sense in constructing this model—and the uncertainty introduced there is altogether impossible for you to quantify meaningfully.
Let’s see if we can try to hug the query here. What exactly is the mistake I’m making when I say that I believe such-and-such is true with probability 0.001?
Is it that I’m not likely to actually be right 999 times out of 1000 occasions when I say this? If so, then you’re (merely) worried about my calibration, not about the fundamental correspondence between beliefs and probabilities.
Or is it, as you seem now to be suggesting, a question of attire: no one has any business speaking “numerically” unless they’re (metaphorically speaking) “wearing a lab coat”? That is, using numbers is a privilege reserved for scientists who’ve done specific kinds of calculations?
It seems to me that the contrast you are positing between “numerical” statements and other indications of degree is illusory. The only difference is that numbers permit an arbitrarily high level of precision; their use doesn’t automatically imply a particular level. Even in the context of scientific calculations, the numbers involved are subject to some particular level of uncertainty. When a scientist makes a calculation to 15 decimal places, they shouldn’t be interpreted as distinguishing between different 20-decimal-digit numbers.
Likewise, when I make the claim that the probability of Amanda Knox’s guilt is 10^(-3), that should not be interpreted as distinguishing (say) between 0.001 and 0.002. It’s meant to be distinguished from 10^(-2) and (perhaps) 10^(-4). I was explicit about this when I said it was an order-of-magnitude estimate. You may worry that such disclaimers are easily forgotten—but this is to disregard the fact that similar disclaimers always apply whenever numbers are used in any context!
In any case, you seem to concede that these numbers ultimately don’t convey any more information than various vague verbal expressions of confidence. If you want to make the latter more systematic and clear, I have no problem with that, but I see no way to turn them into actual numbers without introducing spurious precision.
Here’s the way I do it: I think approximately in terms of the following “scale” of improbabilities:
(1) 10% to 50% (mundane surprise)
(2) 1% to 10% (rare)
(3) 0.1% (=10^(-3)) to 1% (once-in-a-lifetime level surprise on an important question)
(4) 10^(-6) to 10^(-3) (dying in a plane crash or similar)
(5) 10^(-10) to 10^(-6) (winning the lottery; having an experience unique among humankind)
(6) 10^(-100) to 10^(-10) (religions are true)
(7) below 10^(-100) (theoretical level of improbability reached in thought experiments).
Love the logic and the scale, although I think Vladimir_M pokes some important holes specifically at the 10^(-2) to 10^(-3) level.
May I suggest “un-planned for errors?” In my experience, it is not useful to plan for contingencies with about a 1⁄300 chance in happening per trial. For example, on any given day of the year, my favorite cafe might be closed due to the owner’s illness, but I do not call the cafe first to confirm that it is open each time I go there. At any given time, one of my 300-ish acquaintances is probably nursing a grudge against me, but I do not bother to open each conversation with “Hi, do you still like me today?” When, as inevitably happens, I run into a closed cafe or a hostile friend, I usually stop short for a bit; my planning mechanism reports a bug; there is no ‘action string’ cached for that situation, for the simple reason that I was not expecting the situation, because I did not plan for the situation, because that is how rare it is. Nevertheless, I am not ‘surprised’—I know at some level that things that happen about 1⁄300 times are sort of prone to happening once in a while. On the other hand, I would be ‘surprised’ if my favorite cafe had been burned to the ground or if my erstwhile buddy had taken a permanent vow of silence. I expect that these things will never happen to me, and so if they happen I go and double-check my calculations and assumptions, because it seems equally likely that I am wrong about my assumptions and that the 1⁄30,000 event would actually occur. Anyway, the point is that a category 3 event is an event that makes you shut up for a moment but doesn’t make you reexamine any core beliefs.
If you hold most of your core beliefs with probability > .993 then you are almost certainly overconfident in your core beliefs. I’m not talking about stuff like “my senses offer moderately reliable evidence” or “F(g) = GMm/(r^2)”; I’m talking about stuff like “Solominoff induction predicts that hyperintelligent AIs will employ a timeless decision theory.”
(3) 0.1% (=10^(-3)) to 1% (once-in-a-lifetime level surprise on an important question)
10^-3 is roughly the probability that I try to start my car and it won’t start because the
battery has gone bad. Is the scale intended only for questions one asks once per
lifetime? There are lots of questions that one asks once a day, hence my car example.
That is precisely why I added the phrase “on an important question”. It was intended to rule out exactly those sorts of things.
The intended reference class (for me) consists of matters like the Amanda Knox case. But if I got into the habit of judging similar cases every day, that wouldn’t work either.
What exactly is the mistake I’m making when I say that I believe such-and-such is true with probability 0.001? Is it that I’m not likely to actually be right 999 times out of 1000 occasions when I say this? If so, then you’re (merely) worried about my calibration, not about the fundamental correspondence between beliefs and probabilities.
It’s not that I’m worried about your poor calibration in some particular instance, but that I believe that accurate calibration in this sense is impossible in practice, except in some very special cases.
(To give some sense of the problem, if such calibration were possible, then why not calibrate yourself to generate accurate probabilities about the stock market movements and bet on them? It would be an easy and foolproof way to get rich. But of course that there is no way you can make your numbers match reality, not in this problem, nor in most other ones.)
Or is it, as you seem now to be suggesting, a question of attire: no one has any business speaking “numerically” unless they’re (metaphorically speaking) “wearing a lab coat”? That is, using numbers is a privilege reserved for scientists who’ve done specific kinds of calculations?
The way you put it, “scientists” sounds too exclusive. Carpenters, accountants, cashiers, etc. also use numbers and numerical calculations in valid ways. However, their use of numbers can ultimately be scrutinized and justified in similar ways as the scientific use of numbers (even if they themselves wouldn’t be up to that task), so with that qualification, my answer would be yes.
(And unfortunately, in practice it’s not at all rare to see people using numbers in ways that are fundamentally unsound, which sometimes gives rise to whole edifices of pseudoscience. I discussed one such example from economics in this thread.)
Now, you say:
It seems to me that the contrast you are positing between “numerical” statements and other indications of degree is illusory. The only difference is that numbers permit an arbitrarily high level of precision; their use doesn’t automatically imply a particular level. Even in the context of scientific calculations, the numbers involved are subject to some particular level of uncertainty. When a scientist makes a calculation to 15 decimal places, they shouldn’t be interpreted as distinguishing between different 20-decimal-digit numbers.
However, when a scientist makes a calculation with 15 digits of precision, or even just one, he must be able to rigorously justify this degree of precision by pointing to observations that are incompatible with the hypothesis that any of these digits, except the last one, is different. (Or in the case of mathematical constants such as pi and e, to proofs of the formulas used to calculate them.) This disclaimer is implicit in any scientific use of numbers. (Assuming valid science is being done, of course.)
And this is where, in my opinion, you construct an invalid analogy:
Likewise, when I make the claim that the probability of Amanda Knox’s guilt is 10^(-3), that should not be interpreted as distinguishing (say) between 0.001 and 0.002. It’s meant to be distinguished from 10^(-2) and (perhaps) 10^(-4). I was explicit about this when I said it was an order-of-magnitude estimate. You may worry that such disclaimers are easily forgotten—but this is to disregard the fact that similar disclaimers always apply whenever numbers are used in any context!
But these disclaimers are not at all the same! The scientist’s—or the carpenter’s, for that matter—implicit disclaimer is: “This number is subject to this uncertainty interval, but there is a rigorous argument why it cannot be outside that range.” On the other hand, your disclaimer is: “This number was devised using an intuitive and arbitrary procedure that doesn’t provide any rigorous argument about the range it must be in.”
And if I may be permitted such a comment, I do see lots of instances here where people seem to forget about this disclaimer, and write as if they believed that they could actually become Bayesian inferers, rather than creatures who depend on capricious black-box circuits inside their heads to make any interesting judgment about anything, and who are (with the present level of technology) largely unable to examine the internal functioning of these boxes and improve them.
Here’s the way I do it: I think approximately in terms of the following “scale” of improbabilities:
I don’t think such usage is unreasonable, but I think it falls under what I call using numbers as vague figures of speech.
To give some sense of the problem, if such calibration were possible, then why not calibrate yourself to generate accurate probabilities about the stock market movements and bet on them? It would be an easy and foolproof way to get rich.
I think this statement reflects either an ignorance of finance or the Dark Arts.
First, the stock market is the single worst place to try to test out ideas about probabilities, because so many other people are already trying to predict the market, and so much wealth is at stake. Other people’s predictions will remove most of the potential for arbitrage (reducing ‘signal’), and the insider trading and other forms of cheating generated by the potential for quick wealth will further distort any scientifically detectable trends in the market (increasing ‘noise’). Because investments in the stock market must be made in relatively large quantities to avoid losing your money through trading commissions, a causal theory tester is likely to run out of money long before hitting a good payoff even if he or she is already well-calibrated.
Of course, in real life, people might be moderately-calibrated. The fact that one is capable of making some predictions with some accuracy and precision is not a guarantee that one will be able to reliably and detectably beat even a thin market like a political prediction clearinghouse. Nevertheless, some information is often better than none: I am (rationally) much more concerned about automobile accidents than fires, despite the fact that I know two people who have died in fires and none who have died in automobile accidents. I know this based on my inferences from published statistics, the reliability of which I make further inferences about. I am quite confident (p ~ .95) that it is sensible to drive defensively (at great cost in effort and time) while essentially ignoring fire safety (even though checking a fire extinguisher or smoke detector might take minimal effort.)
I don’t play the stock market, though. I’m not that well calibrated, and probably nobody is without access to inside info of one kind or another.
I think this statement reflects either an ignorance of finance or the Dark Arts.
I’m not an expert on finance, but I am aware of everything you wrote about it in your comment. So I guess this leaves us with the second option. The Dark Arts hypothesis is probably that I’m using the extreme example of the stock market to suggest a general sweeping conclusion that in fact doesn’t hold in less extreme cases.
To which I reply: yes, the stock market is an extreme example, but I honestly can’t think of any other examples that would show otherwise. There are many examples of scientific models that provide more or less accurate probability estimates for all kinds of things, to be sure, but I have yet to hear about people achieving practical success in anything relevant by translating their common-sense feelings of confidence in various beliefs into numerical probabilities.
In my view, calibration of probability estimates can succeed only if (1) you come up with a valid scientific model which you can then use in a shut-up-and-calculate way instead of applying common sense (though you still need it to determine whether the model is applicable in the first place), or (2) you make an essentially identical judgment many times, and from your past performance you extrapolate how frequently the black box inside your head tends to be right.
Now, you try to provide some counterexamples:
I am (rationally) much more concerned about automobile accidents than fires, despite the fact that I know two people who have died in fires and none who have died in automobile accidents. I know this based on my inferences from published statistics, the reliability of which I make further inferences about. I am quite confident (p ~ .95) that it is sensible to drive defensively (at great cost in effort and time) while essentially ignoring fire safety (even though checking a fire extinguisher or smoke detector might take minimal effort.)
Frankly, the only subjective probability estimate I see here is the p~0.95 for your belief about driving. In this case, I’m not getting any more information from this number than if you just described your level of certainty in words, nor do I see any practical application to which you can put this number. I have no objection to your other conclusions, but I see nothing among them that would be controversial to even the most extreme frequentist.
Not sure who voted down your reply; it looks polite and well-reasoned to me.
I believe you when you say that the stock market was honestly intended as representative, although, of course, I continue to disagree about whether it actually is representative.
Here are some more counterexamples:
*When deciding whether to invest in an online bank that pays 1% interest or a local community bank that pays 0.1% interest, I must calculate the odds that each bank will fail before I take my money out; I cannot possibly have a scientific model that generates replicable results for these two banks while also holding down a day job, but numbers will nevertheless help me make a decision that is not driven by an emotional urge to stay with (or leave) an old bank based on customer service considerations that I rationally value as far less than the value of my principal.
*When deciding whether to donate time, money, or neither to a local election campaign, it will help to know which of my donations will have an 10^-6 chance, a 10^-4 chance, and a 10^-2 chance of swinging the election. Numbers are important here because irrational friends and colleagues will urge me to do what ‘feels right’ or to ‘do my part’ without pausing to consider whether this serves any of our goals. If I can generate a replicable scientific model that says whether an extra $500 will win an election, I should stop electioneering and sign up for a job as a tenured political science faculty member, but I nevertheless want to know what the odds are, approximately, in each case, if only so that I can pick which campaign to work on.
As for your objection that:
the only subjective probability estimate I see here is the p~0.95 for your belief about driving. In this case, I’m not getting any more information from this number than if you just described your level of certainty in words,
I suppose I have left a few steps out of my analysis, which I am spelling out in full now:
*Published statistics say that the risk of dying in a fire is 10^-7/people-year and the risk of dying in a car crash is 10^-4/people-year (a report of what is no doubt someone else’s subjective but relatively evidence-based estimate).
*The odds that these statistics are off by more than a factor of 10 relative to each other are less than 10^-1 (a subjective estimate).
*My cost in effort to protect against car crashes is more than 10 times higher than my cost in effort to protect against fires.
*I value the disutility of death-by-fire and death-by-car-crash roughly equally.
*Therefore, there exists a coherent utility function with respect to the relevant states of the world and my relevant strategies such that it is rational for me to protect against car crashes but not fires.
*Therefore, one technique that could be used to show that my behavior is internally incoherent has failed to reject the null hypothesis.
*Therefore, I have some Bayesian evidence that my behavior is rational.
Please let me know if you still think I’m just putting fancy arithmetic labels on what is essentially ‘frequentist’ reasoning, and, if so, exactly what you mean by ‘frequentist.’ I can Wikipedia the standard definition, but it doesn’t quite seem to fit here, imho.
Regarding your examples with banks and donations, when I imagine myself in such situations, I still don’t see how numbers derived from my own common-sense reasoning can be useful. I can see myself making a decision based a simple common-sense impression that one bank looks less shady, or that it’s bigger and thus more likely to be bailed out, etc. Similarly, I could act on a vague impression that one political candidacy I’d favor is far more hopeless than another, and so on. On the other hand, I could also judge from the results of calculations based on numbers from real expert input, like actuary tables for failures of banks of various types, or the poll numbers for elections, etc.
What I cannot imagine, however, is doing anything sensible and useful with probabilities dreamed up from vague common-sense impressions. For example, looking at a bank, getting the impression that it’s reputable and solid, and then saying, “What’s the probability it will fail before time T? Um.. seems really unlikely… let’s say 0.1%.”, and then using this number to calculate my expected returns.
Now, regarding your example with driving vs. fires, suppose I simply say: “Looking at the statistical tables, it is far more likely to be killed by a car accident than a fire. I don’t see any way in which I’m exceptional in my exposure to either, so if I want to make myself safer, it would be stupid to invest more effort in reducing the chance of fire than in more careful driving.” What precisely have you gained with your calculation relative to this plain and clear English statement?
In particular, what is the significance of these subjectively estimated probabilities like p=10^-1 in step 2? What more does this number tell us than a simple statement like “I don’t think it’s likely”? Also, notice that my earlier comment specifically questioned the meaningfulness and practical usefulness of the numerical claim that p~0.95 for this conclusion, and I don’t see how it comes out of your calculation. These seem to be exactly the sorts of dreamed-up probability numbers whose meaningfulness I’m denying.
It seems plausible to me that routinely assigning numerical probabilities to predictions/beliefs that can be tested and tracking these over time to see how accurate your probabilities are (calibration) can lead to a better ability to reliably translate vague feelings of certainty into numerical probabilities.
There are practical benefits to developing this ability. I would speculate that successful bookies and professional sports bettors are better at this than average for example and that this is an ability they have developed through practice and experience. Anyone who has to make decisions under uncertainty seems like they could benefit from a well developed ability to assign well calibrated numerical probability estimates to vague feelings of certainty. Investors, managers, engineers and others who must deal with uncertainty on a regular basis would surely find this ability useful.
I think a certain degree of skepticism is justified regarding the utility of various specific methods for developing this ability (things like predictionbook.com don’t yet have hard evidence for their effectiveness) but it certainly seems like it is a useful ability to have and so there are good reasons to experiment with various methods that promise to improve calibration.
I agree with most of what you’re saying (in that comment and this one) but I still think that the ability to give well calibrated probability estimates for a particular prediction is instrumentally useful and that it is fairly likely that this is an ability that can be improved with practice. I don’t take this to imply anything about humans performing actual Bayesian calculations either implicitly or explicitly.
Upvoted. Definitely can’t back you on this one.
Are you sure you’re not just worried about poor calibration?
Another upvote. That’s crazy talk.
komponisto:
No, my objection is fundamental. I provide a brief explanation in the comment I linked to, but I’ll restate it here briefly.
The problem is that the algorithms that your brain uses to perform common-sense reasoning are not transparent to your conscious mind, which has access only to their final output. This output does not provide a numerical probability estimate, but only a rough and vague feeling of certainty. Yet in most situations, the output of your common sense is all you have. There are very few interesting things you can reason about by performing mathematically rigorous probability calculations (and even when you can, you still have to use common sense to establish the correspondence between the mathematical model and reality).
Therefore, there are only two ways in which you can arrive at a numerical probability estimate for a common-sense belief:
Translate your vague feeling of certainly into a number in some arbitrary manner. This however makes the number a mere figure of speech, which adds absolutely nothing over the usual human vague expressions for different levels of certainty.
Perform some probability calculation, which however has nothing to do with how your brain actually arrived at your common-sense conclusion, and then assign the probability number produced by the former to the latter. This is clearly fallacious.
Honestly, all this seems entirely obvious to me. I would be curious to see which points in the above reasoning are supposed to be even controversial, let alone outright false.
Disagree here. Numbers get people to convey more information about their beliefs. It doesn’t matter whether you actually use numbers, or do something similar (and equivalent) like systematize the use of vague expressions. I’d be just as happy if people used a “five-star” system, or even in many cases if they just compared the belief in question to other beliefs used as reference-points.
Disagree here also. The probability calculation you present should represent your brain’s reasoning, as revealed by introspection. This is not a perfect process, and may be subject to later refinement. But it is definitely meaningful.
For example, consider my current probability estimate of 10^(-3) that Amanda Knox killed her roommate. On my current analysis, this is obtained as follows: I start with a prior of 10^(-4) (from a general homicide rate of about 10^(-3), plus reasoning that Knox is demographically an order of magnitude less likely to kill than the typical person; the figure happens to match my intuitive sense that I’d have to meet about 10,000 similar people before I’d have any fear for my life). Then all the evidence in the case raises the probability by about an order of magnitude at most, yielding 10^(-3).
Now, this is just a rough order-of-magnitude argument. But it’s already much more meaningful and useful than my just saying “I don’t think she did it”. It provides a way of breaking down the reasoning, so that points of disagreement can be precisely identified in an efficient manner. (If you happened to disagree, the next step would be to say something like “but surely evidence X alone raises the odds by more than a factor of ten”, and then we’d iterate the process specifically on X rather than the original proposition.)
It’s a very useful technique for keeping debates informative, and preventing them from turning into (pure) status signaling contests.
komponisto:
If I understand correctly, you’re saying that talking about numbers rather than the usual verbal expressions of certainty prompts people to be more careful and re-examine their reasoning more strictly. This may be true sometimes, but on the other hand, numbers also tend to give a false feeling of accuracy and rigor where there is none. One of the usual symptoms (and, in turn, catalysts) of pseudoscience is the use of numbers with spurious precision and without rigorous justification.
In any case, you seem to concede that these numbers ultimately don’t convey any more information than various vague verbal expressions of confidence. If you want to make the latter more systematic and clear, I have no problem with that, but I see no way to turn them into actual numbers without introducing spurious precision.
Trouble is, this is often not possible. Most of what happens in your brain is not amenable to introspection, and you cannot devise a probability calculation that will capture all the important things that happen there. Take your own example:
See, this is where, in my opinion, you’re introducing spurious numerical claims that are at best unnecessary and at worst outright misleading.
First you note that murderers are extremely rare, and that AK is a sort of person especially unlikely to be one. OK, say you can justify these numbers by looking at crime statistics. Then you perform a complex common-sense evaluation of the evidence, and your brain tells you that on the whole it’s weak, so it’s highly unlikely that AK killed the victim. So far, so good. But then you insist on turning this feeling of near-certainty about AK’s innocence into a number, and you end up making a quantitative claim that has no justification at all. You say:
I strongly disagree. Neither is this number you came up with any more meaningful than the simple plain statement “I think it’s highly unlikely she did it,” nor does it offer any additional practical benefit. On the contrary, it suggests that you can actually make a mathematically rigorous case that the number is within some well-defined limits. (Which you do disclaim, but which is easy to forget.)
Even worse, your claims suggest that while your numerical estimates may be off by an order of magnitude or so, the model you’re applying to the problem captures reality well enough that it’s only necessary to plug in accurate probability estimates. But how do you know that the model is correct in the first place? Your numbers are ultimately based on an entirely non-mathematical application of common sense in constructing this model—and the uncertainty introduced there is altogether impossible for you to quantify meaningfully.
Let’s see if we can try to hug the query here. What exactly is the mistake I’m making when I say that I believe such-and-such is true with probability 0.001?
Is it that I’m not likely to actually be right 999 times out of 1000 occasions when I say this? If so, then you’re (merely) worried about my calibration, not about the fundamental correspondence between beliefs and probabilities.
Or is it, as you seem now to be suggesting, a question of attire: no one has any business speaking “numerically” unless they’re (metaphorically speaking) “wearing a lab coat”? That is, using numbers is a privilege reserved for scientists who’ve done specific kinds of calculations?
It seems to me that the contrast you are positing between “numerical” statements and other indications of degree is illusory. The only difference is that numbers permit an arbitrarily high level of precision; their use doesn’t automatically imply a particular level. Even in the context of scientific calculations, the numbers involved are subject to some particular level of uncertainty. When a scientist makes a calculation to 15 decimal places, they shouldn’t be interpreted as distinguishing between different 20-decimal-digit numbers.
Likewise, when I make the claim that the probability of Amanda Knox’s guilt is 10^(-3), that should not be interpreted as distinguishing (say) between 0.001 and 0.002. It’s meant to be distinguished from 10^(-2) and (perhaps) 10^(-4). I was explicit about this when I said it was an order-of-magnitude estimate. You may worry that such disclaimers are easily forgotten—but this is to disregard the fact that similar disclaimers always apply whenever numbers are used in any context!
Here’s the way I do it: I think approximately in terms of the following “scale” of improbabilities:
(1) 10% to 50% (mundane surprise)
(2) 1% to 10% (rare)
(3) 0.1% (=10^(-3)) to 1% (once-in-a-lifetime level surprise on an important question)
(4) 10^(-6) to 10^(-3) (dying in a plane crash or similar)
(5) 10^(-10) to 10^(-6) (winning the lottery; having an experience unique among humankind)
(6) 10^(-100) to 10^(-10) (religions are true)
(7) below 10^(-100) (theoretical level of improbability reached in thought experiments).
Love the logic and the scale, although I think Vladimir_M pokes some important holes specifically at the 10^(-2) to 10^(-3) level.
May I suggest “un-planned for errors?” In my experience, it is not useful to plan for contingencies with about a 1⁄300 chance in happening per trial. For example, on any given day of the year, my favorite cafe might be closed due to the owner’s illness, but I do not call the cafe first to confirm that it is open each time I go there. At any given time, one of my 300-ish acquaintances is probably nursing a grudge against me, but I do not bother to open each conversation with “Hi, do you still like me today?” When, as inevitably happens, I run into a closed cafe or a hostile friend, I usually stop short for a bit; my planning mechanism reports a bug; there is no ‘action string’ cached for that situation, for the simple reason that I was not expecting the situation, because I did not plan for the situation, because that is how rare it is. Nevertheless, I am not ‘surprised’—I know at some level that things that happen about 1⁄300 times are sort of prone to happening once in a while. On the other hand, I would be ‘surprised’ if my favorite cafe had been burned to the ground or if my erstwhile buddy had taken a permanent vow of silence. I expect that these things will never happen to me, and so if they happen I go and double-check my calculations and assumptions, because it seems equally likely that I am wrong about my assumptions and that the 1⁄30,000 event would actually occur. Anyway, the point is that a category 3 event is an event that makes you shut up for a moment but doesn’t make you reexamine any core beliefs.
If you hold most of your core beliefs with probability > .993 then you are almost certainly overconfident in your core beliefs. I’m not talking about stuff like “my senses offer moderately reliable evidence” or “F(g) = GMm/(r^2)”; I’m talking about stuff like “Solominoff induction predicts that hyperintelligent AIs will employ a timeless decision theory.”
10^-3 is roughly the probability that I try to start my car and it won’t start because the battery has gone bad. Is the scale intended only for questions one asks once per lifetime? There are lots of questions that one asks once a day, hence my car example.
That is precisely why I added the phrase “on an important question”. It was intended to rule out exactly those sorts of things.
The intended reference class (for me) consists of matters like the Amanda Knox case. But if I got into the habit of judging similar cases every day, that wouldn’t work either.
Think “questions I might write a LW post about”.
komponisto:
It’s not that I’m worried about your poor calibration in some particular instance, but that I believe that accurate calibration in this sense is impossible in practice, except in some very special cases.
(To give some sense of the problem, if such calibration were possible, then why not calibrate yourself to generate accurate probabilities about the stock market movements and bet on them? It would be an easy and foolproof way to get rich. But of course that there is no way you can make your numbers match reality, not in this problem, nor in most other ones.)
The way you put it, “scientists” sounds too exclusive. Carpenters, accountants, cashiers, etc. also use numbers and numerical calculations in valid ways. However, their use of numbers can ultimately be scrutinized and justified in similar ways as the scientific use of numbers (even if they themselves wouldn’t be up to that task), so with that qualification, my answer would be yes.
(And unfortunately, in practice it’s not at all rare to see people using numbers in ways that are fundamentally unsound, which sometimes gives rise to whole edifices of pseudoscience. I discussed one such example from economics in this thread.)
Now, you say:
However, when a scientist makes a calculation with 15 digits of precision, or even just one, he must be able to rigorously justify this degree of precision by pointing to observations that are incompatible with the hypothesis that any of these digits, except the last one, is different. (Or in the case of mathematical constants such as pi and e, to proofs of the formulas used to calculate them.) This disclaimer is implicit in any scientific use of numbers. (Assuming valid science is being done, of course.)
And this is where, in my opinion, you construct an invalid analogy:
But these disclaimers are not at all the same! The scientist’s—or the carpenter’s, for that matter—implicit disclaimer is: “This number is subject to this uncertainty interval, but there is a rigorous argument why it cannot be outside that range.” On the other hand, your disclaimer is: “This number was devised using an intuitive and arbitrary procedure that doesn’t provide any rigorous argument about the range it must be in.”
And if I may be permitted such a comment, I do see lots of instances here where people seem to forget about this disclaimer, and write as if they believed that they could actually become Bayesian inferers, rather than creatures who depend on capricious black-box circuits inside their heads to make any interesting judgment about anything, and who are (with the present level of technology) largely unable to examine the internal functioning of these boxes and improve them.
I don’t think such usage is unreasonable, but I think it falls under what I call using numbers as vague figures of speech.
I think this statement reflects either an ignorance of finance or the Dark Arts.
First, the stock market is the single worst place to try to test out ideas about probabilities, because so many other people are already trying to predict the market, and so much wealth is at stake. Other people’s predictions will remove most of the potential for arbitrage (reducing ‘signal’), and the insider trading and other forms of cheating generated by the potential for quick wealth will further distort any scientifically detectable trends in the market (increasing ‘noise’). Because investments in the stock market must be made in relatively large quantities to avoid losing your money through trading commissions, a causal theory tester is likely to run out of money long before hitting a good payoff even if he or she is already well-calibrated.
Of course, in real life, people might be moderately-calibrated. The fact that one is capable of making some predictions with some accuracy and precision is not a guarantee that one will be able to reliably and detectably beat even a thin market like a political prediction clearinghouse. Nevertheless, some information is often better than none: I am (rationally) much more concerned about automobile accidents than fires, despite the fact that I know two people who have died in fires and none who have died in automobile accidents. I know this based on my inferences from published statistics, the reliability of which I make further inferences about. I am quite confident (p ~ .95) that it is sensible to drive defensively (at great cost in effort and time) while essentially ignoring fire safety (even though checking a fire extinguisher or smoke detector might take minimal effort.)
I don’t play the stock market, though. I’m not that well calibrated, and probably nobody is without access to inside info of one kind or another.
Mass_Driver:
I’m not an expert on finance, but I am aware of everything you wrote about it in your comment. So I guess this leaves us with the second option. The Dark Arts hypothesis is probably that I’m using the extreme example of the stock market to suggest a general sweeping conclusion that in fact doesn’t hold in less extreme cases.
To which I reply: yes, the stock market is an extreme example, but I honestly can’t think of any other examples that would show otherwise. There are many examples of scientific models that provide more or less accurate probability estimates for all kinds of things, to be sure, but I have yet to hear about people achieving practical success in anything relevant by translating their common-sense feelings of confidence in various beliefs into numerical probabilities.
In my view, calibration of probability estimates can succeed only if (1) you come up with a valid scientific model which you can then use in a shut-up-and-calculate way instead of applying common sense (though you still need it to determine whether the model is applicable in the first place), or (2) you make an essentially identical judgment many times, and from your past performance you extrapolate how frequently the black box inside your head tends to be right.
Now, you try to provide some counterexamples:
Frankly, the only subjective probability estimate I see here is the p~0.95 for your belief about driving. In this case, I’m not getting any more information from this number than if you just described your level of certainty in words, nor do I see any practical application to which you can put this number. I have no objection to your other conclusions, but I see nothing among them that would be controversial to even the most extreme frequentist.
Not sure who voted down your reply; it looks polite and well-reasoned to me.
I believe you when you say that the stock market was honestly intended as representative, although, of course, I continue to disagree about whether it actually is representative.
Here are some more counterexamples:
*When deciding whether to invest in an online bank that pays 1% interest or a local community bank that pays 0.1% interest, I must calculate the odds that each bank will fail before I take my money out; I cannot possibly have a scientific model that generates replicable results for these two banks while also holding down a day job, but numbers will nevertheless help me make a decision that is not driven by an emotional urge to stay with (or leave) an old bank based on customer service considerations that I rationally value as far less than the value of my principal.
*When deciding whether to donate time, money, or neither to a local election campaign, it will help to know which of my donations will have an 10^-6 chance, a 10^-4 chance, and a 10^-2 chance of swinging the election. Numbers are important here because irrational friends and colleagues will urge me to do what ‘feels right’ or to ‘do my part’ without pausing to consider whether this serves any of our goals. If I can generate a replicable scientific model that says whether an extra $500 will win an election, I should stop electioneering and sign up for a job as a tenured political science faculty member, but I nevertheless want to know what the odds are, approximately, in each case, if only so that I can pick which campaign to work on.
As for your objection that:
I suppose I have left a few steps out of my analysis, which I am spelling out in full now:
*Published statistics say that the risk of dying in a fire is 10^-7/people-year and the risk of dying in a car crash is 10^-4/people-year (a report of what is no doubt someone else’s subjective but relatively evidence-based estimate).
*The odds that these statistics are off by more than a factor of 10 relative to each other are less than 10^-1 (a subjective estimate).
*My cost in effort to protect against car crashes is more than 10 times higher than my cost in effort to protect against fires.
*I value the disutility of death-by-fire and death-by-car-crash roughly equally.
*Therefore, there exists a coherent utility function with respect to the relevant states of the world and my relevant strategies such that it is rational for me to protect against car crashes but not fires.
*Therefore, one technique that could be used to show that my behavior is internally incoherent has failed to reject the null hypothesis.
*Therefore, I have some Bayesian evidence that my behavior is rational.
Please let me know if you still think I’m just putting fancy arithmetic labels on what is essentially ‘frequentist’ reasoning, and, if so, exactly what you mean by ‘frequentist.’ I can Wikipedia the standard definition, but it doesn’t quite seem to fit here, imho.
Regarding your examples with banks and donations, when I imagine myself in such situations, I still don’t see how numbers derived from my own common-sense reasoning can be useful. I can see myself making a decision based a simple common-sense impression that one bank looks less shady, or that it’s bigger and thus more likely to be bailed out, etc. Similarly, I could act on a vague impression that one political candidacy I’d favor is far more hopeless than another, and so on. On the other hand, I could also judge from the results of calculations based on numbers from real expert input, like actuary tables for failures of banks of various types, or the poll numbers for elections, etc.
What I cannot imagine, however, is doing anything sensible and useful with probabilities dreamed up from vague common-sense impressions. For example, looking at a bank, getting the impression that it’s reputable and solid, and then saying, “What’s the probability it will fail before time T? Um.. seems really unlikely… let’s say 0.1%.”, and then using this number to calculate my expected returns.
Now, regarding your example with driving vs. fires, suppose I simply say: “Looking at the statistical tables, it is far more likely to be killed by a car accident than a fire. I don’t see any way in which I’m exceptional in my exposure to either, so if I want to make myself safer, it would be stupid to invest more effort in reducing the chance of fire than in more careful driving.” What precisely have you gained with your calculation relative to this plain and clear English statement?
In particular, what is the significance of these subjectively estimated probabilities like p=10^-1 in step 2? What more does this number tell us than a simple statement like “I don’t think it’s likely”? Also, notice that my earlier comment specifically questioned the meaningfulness and practical usefulness of the numerical claim that p~0.95 for this conclusion, and I don’t see how it comes out of your calculation. These seem to be exactly the sorts of dreamed-up probability numbers whose meaningfulness I’m denying.
It seems plausible to me that routinely assigning numerical probabilities to predictions/beliefs that can be tested and tracking these over time to see how accurate your probabilities are (calibration) can lead to a better ability to reliably translate vague feelings of certainty into numerical probabilities.
There are practical benefits to developing this ability. I would speculate that successful bookies and professional sports bettors are better at this than average for example and that this is an ability they have developed through practice and experience. Anyone who has to make decisions under uncertainty seems like they could benefit from a well developed ability to assign well calibrated numerical probability estimates to vague feelings of certainty. Investors, managers, engineers and others who must deal with uncertainty on a regular basis would surely find this ability useful.
I think a certain degree of skepticism is justified regarding the utility of various specific methods for developing this ability (things like predictionbook.com don’t yet have hard evidence for their effectiveness) but it certainly seems like it is a useful ability to have and so there are good reasons to experiment with various methods that promise to improve calibration.
I addressed this point in another comment in this thread:
http://lesswrong.com/lw/2sl/the_irrationality_game/2qgm
I agree with most of what you’re saying (in that comment and this one) but I still think that the ability to give well calibrated probability estimates for a particular prediction is instrumentally useful and that it is fairly likely that this is an ability that can be improved with practice. I don’t take this to imply anything about humans performing actual Bayesian calculations either implicitly or explicitly.