I have read most of the responses and still am not sure whether to upvote or not. I doubt among several (possibly overlapping) interpretations of your statement. Could you tell to what extent the following interpretations really reflect what you think?
Confession of frequentism. Only sensible numerical probabilities are those related to frequencies, i.e. either frequencies of outcomes of repeated experiments, or probabilities derived from there. (Creative drawing of reference-class boundaries may be permitted.) Especially, prior probabilities are meaningless.
Any sensible numbers must be produced using procedures that ultimately don’t include any numerical parameters (maybe except small integers like 2,3,4). Any number which isn’t a result of such a procedure is labeled arbitrary, and therefore meaningless. (Observation and measurement, of course, do count as permitted procedures. Admittedly arbitrary steps, like choosing units of measurement, are also permitted.)
Degrees of confidence shall be expressed without reflexive thinking about them. Trying to establish a fixed scale of confidence levels (like impossible—very unlikely—unlikely—possible—likely—very likely—almost certain—certain), or actively trying to compare degrees of confidence in different beliefs is cheating, since such scales can be then converted into numbers using a non-numerical procedure.
The question of whether somebody is well calibrated is confused for some reason. Calibrating people has no sense. Although we may take the “almost certain” statements of a person and look at how often they are true, the resulting frequency has no sense for some reason.
Unlike #3, beliefs can be ordered or classified on some scale (possibly imprecisely), but assigning numerical values brings confusing connotations and should be avoided. Alternatively said, the meaning of subjective probabilities is preserved after monotonous rescaling.
Although, strictly speaking, human reasoning can be modelled as a Bayesian network where beliefs have numerical strengths, human introspection is poor at assessing their values. Declared values more likely depend on anchoring than on the real strength of the belief. Speaking about numbers actually introduces noise into reasoning.
Human reasoning cannot be modelled by Bayesian inference, not even in approximation.
That’s an excellent list of questions! It will help me greatly to systematize my thinking on the topic.
Before replying to the specific items you list, perhaps I should first state the general position I’m coming from, which motivates me to get into discussions of this sort. Namely, it is my firm belief that when we look at the present state of human knowledge, one of the principal sources of confusion, nonsense, and pseudosicence is physics envy, which leads people in all sorts of fields to construct nonsensical edifices of numerology and then pretend, consciously or not, that they’ve reached some sort of exact scientific insight. Therefore, I believe that whenever one encounters people talking about numbers of any sort that look even slightly suspicious, they should be considered guilty until proven otherwise—and this entire business with subjective probability estimates for common-sense beliefs doesn’t come even close to clearing that bar for me.
Now to reply to your list.
(1) Confession of frequentism. Only sensible numerical probabilities are those related to frequencies, i.e. either frequencies of outcomes of repeated experiments, or probabilities derived from there. (Creative drawing of reference-class boundaries may be permitted.) Especially, prior probabilities are meaningless.
(2) Any sensible numbers must be produced using procedures that ultimately don’t include any numerical parameters (maybe except small integers like 2,3,4). Any number which isn’t a result of such a procedure is labeled arbitrary, and therefore meaningless. (Observation and measurement, of course, do count as permitted procedures. Admittedly arbitrary steps, like choosing units of measurement, are also permitted.)
My answer to (1) follows from my opinion about (2).
In my view, a number that gives any information about the real world must ultimately refer, either directly or via some calculation, to something that can be measured or counted (at least in principle, perhaps using a thought-experiment). This doesn’t mean that all sensible numbers have to be derived from concrete empirical measurements; they can also follow from common-sense insight and generalization. For example, reading about Newton’s theory leads to the common-sense insight that it’s a very close approximation of reality under certain assumptions. Now, if we look at the gravity formula F=m1*m2/r^2 (in units set so that G=1), the number 2 in the denominator is not a product of any concrete measurement, but a generalization from common sense. Yet what makes it sensible is that it ultimately refers to measurable reality via a well-defined formula: measure the force between two bodies of known masses at distance r, and you’ll get log(m1*m2/F)/log(r) = 2.
Now, what can we make out of probabilities from this viewpoint? I honestly can’t think of any sensible non-frequentist answer to this question. Subjectivist Bayesian phrases such as “the degree of belief” sound to me entirely ghostlike unless this “degree” is verifiable via some frequentist practical test, at least in principle. In this sense, I do confess frequentism. (Though I don’t wish to subscribe to all the related baggage from various controversies in statistics, much of which is frankly over my head.)
(3) Degrees of confidence shall be expressed without reflexive thinking about them. Trying to establish a fixed scale of confidence levels (like impossible—very unlikely—unlikely—possible—likely—very likely—almost certain—certain), or actively trying to compare degrees of confidence in different beliefs is cheating, since such scales can be then converted into numbers using a non-numerical procedure.
That depends on the concrete problem under consideration, and on the thinker who is considering it. The thinker’s brain produces an answer alongside a more or less fuzzy feeling of confidence, and the human language has the capacity to express these feelings with about the same level of fuziness as that signal. It can be sensible to compare intuitive confidence levels, if such comparison can be put to a practical (i.e. frequentist) test. Eight ordered intuitive levels of certainty might perhaps be too much, but with, say, four levels, I could produce four lists of predictions labeled “almost impossible,” “unlikely,” “likely,” and “almost certain,” such that common-sense would tell us that, with near-certainty, those in each subsequent list would turn out to be true in ever greater proportion.
If I wish to express these probabilities as numbers, however, this is not a legitimate step unless the resulting numbers can be justified in the sense discussed above under (1) and (2). This requires justification both in the sense of defining what aspect of reality they refer to (where frequentism seems like the only answer), and guaranteeing that they will be accurate under empirical tests. If they can be so justified, then we say that the intuitive estimate is “well-calibrated.” However, calibration is usually not possible in practice, and there are only two major exceptions.
The first possible path towards accurate calibration is when the same person performs essentially the same judgment many times, and from the past performance we extract the frequency with which their brain tends to produce the right answer. If this level of accuracy remains roughly constant in time, then it makes sense to attach it as the probability to that person’s future judgments on the topic. This approach treats the relevant operations in the brain as a black box whose behavior, being roughly constant, can be subjected to such extrapolation.
The second possible path is reached when someone has a sufficient level of insight about some problem to cross the fuzzy limit between common-sense thinking and an actual scientific model. Increasingly subtle and accurate thinking about a problem can result in the construction of a mathematical model that approximates reality well enough that when applied in a shut-up-and-calculate way, it yields probability estimates that will be subsequently vindicated empirically.
(Still, deciding whether the model is applicable in some particular situation remains a common-sense problem, and the probabilities yielded by the model do not capture this uncertainty. If a well-established physical theory, applied by competent people, says that p=0.9999 for some event, common sense tells me that I should treat this event as near-certain—and, if repeated many times, that it will come out the unlikely way very close to one in 10,000 times. On the other hand, if p=0.9999 is produced by some suspicious model that looks like it might be a product of data-dredging rather than real insight about reality, common sense tells me that the event is not at all certain. But there is no way to capture this intuitive uncertainty with a sensible number. The probabilities coming from calibration of repeated judgment are subject to analogous unquantifiable uncertainty.)
There is also a third logical possibility, namely that some people in some situations have precise enough intuitions of certaintly that they can quantify them in an accurate way, just like some people can guess what time it is with remarkable precision without looking at the clock. But I see little evidence of this occurring in reality, and even if it does, these are very rare special cases.
(4) The question of whether somebody is well calibrated is confused for some reason. Calibrating people has no sense. Although we may take the “almost certain” statements of a person and look at how often they are true, the resulting frequency has no sense for some reason.
I disagree with this, as explained above. Calibration can be done successfully in the special cases I mentioned. However, in cases where it cannot be done, which includes the great majority of the actual beliefs and conclusions made by human brains, devising numerical probabilities makes no sense.
(5) Unlike #3, beliefs can be ordered or classified on some scale (possibly imprecisely), but assigning numerical values brings confusing connotations and should be avoided. Alternatively said, the meaning subjective probabilities is preserved after monotonous rescaling.
This should be clear from the answer to (3).
[Continued in a separate comment below due to excessive length.]
I should first state the general position I’m coming from, which motivates me to get into discussions of this sort. Namely, it is my firm belief that when we look at the present state of human knowledge, one of the principal sources of confusion, nonsense, and pseudosicence is physics envy, which leads people in all sorts of fields to construct nonsensical edifices of numerology and then pretend, consciously or not, that they’ve reached some sort of exact scientific insight.
In my view, if someone’s numbers are wrong, that should be dealt with on the object level (e.g. “0.001 is too low”, with arguments for why), rather than retreating to the meta level of “using numbers caused you to err”. The perspective I come from is wanting to avoid the opposite problem, where being vague about one’s beliefs allows one to get away without subjecting them to rigorous scrutiny. (This, too, by the way, is a major hallmark of pseudoscience.)
But I’ll note that even as we continue to argue under opposing rhetorical banners, our disagreement on the practical issue seems to have mostly evaporated; see here for instance. You also do admit in the end that fear of poor calibration is what is underlying your discomfort with numerical probabilities:
If I wish to express these probabilities as numbers, however, this is not a legitimate step unless the resulting numbers can be justified… If they can be so justified, then we say that the intuitive estimate is “well-calibrated.” However, calibration is usually not possible in practice...
As a theoretical matter, I disagree completely with the notion that probabilities are not legitimate or meaningful unless they’re well-calibrated. There is such a thing as a poorly-calibrated Bayesian; it’s a perfectly coherent concept. The Bayesian view of probabilities is that they refer specifically to degrees of belief, and not anything else. We would of course like the beliefs so represented to be as accurate as possible; but they may not be in practice.
If my internal “Bayesian calculator” believes P(X) = 0.001, and X turns out to be true, I’m not made less wrong by having concealed the number, saying “I don’t think X is true” instead. Less embarrassed, perhaps, but not less wrong.
In my view, if someone’s numbers are wrong, that should be dealt with on the object level (e.g. “0.001 is too low”, with arguments for why), rather than retreating to the meta level of “using numbers caused you to err”.
Trouble is, sometimes numbers can be not even wrong, with their very definition lacking logical consistency or any defensible link with reality. It is that category that I am most concerned with, and I believe that it sadly occurs very often in practice, with entire fields of inquiry sometimes degenerating into meaningless games with such numbers. My honest impression is that in our day and age, such numerological fallacies have been responsible for much greater intellectual sins than the opposite fallacy of avoiding scrutiny by excessive vagueness, although the latter phenomenon is not negligible either.
You also do admit in the end that fear of poor calibration is what is underlying your discomfort with numerical probabilities:
Here we seem to be clashing about terminology. I think that “poor calibration” is too much of a euphemism for the situations I have in mind, namely those where sensible calibration is altogether impossible. I would instead use some stronger expression clarifying that the supposed “calibration” is done without any valid basis, not that the result is poor because some unfortunate circumstance occurred in the course of an otherwise sensible procedure.
There is such a thing as a poorly-calibrated Bayesian; it’s a perfectly coherent concept. The Bayesian view of probabilities is that they refer specifically to degrees of belief, and not anything else.
As I explained in the above lengthy comment, I simply don’t find numbers that “refer specifically to degrees of belief, and not anything else” a coherent concept. We seem to be working with fundamentally different philosophical premises here.
Can these numerical “degrees of belief” somehow be linked to observable reality according to the criteria I defined in my reply to the points (1)-(2) above? If not, I don’t see how admitting such concepts can be of any use.
If my internal “Bayesian calculator” believes P(X) = 0.001, and X turns out to be true, I’m not made less wrong by having concealed the number, saying “I don’t think X is true” instead. Less embarrassed, perhaps, but not less wrong.
But if you do this 10,000 times, and the number of times X turns out to be true is small but nowhere close to 10, you are much more wrong than if you had just been saying “X is highly unlikely” all along.
On the other hand, if we’re observing X as a single event in isolation, I don’t see how this tests your probability estimate in any way. But I suspect we have some additional philosophical differences here.
(6) Although, strictly speaking, human reasoning can be modelled as a Bayesian network where beliefs have numerical strengths, human introspection is poor at assessing their values. Declared values more likely depend on anchoring than on the real strength of the belief. Speaking about numbers actually introduces noise into reasoning.
I have revised my view about this somewhat thanks to a shrewd comment by xv15. The use of unjustified numerical probabilities can sometimes be a useful figure of speech that will convey an intuitive feeling of certainty to other people more faithfully than verbal expressions. But the important thing to note here is that the numbers in such situations are mere figures of speech, i.e. expressions that exploit various idiosyncrasies of human language and thinking to transmit hard-to-convey intuitive points via non-literal meanings. It is not legitimate to use these numbers for any other purpose.
Otherwise, I agree. Except in the above-discussed cases, subjective probabilities extracted from common-sense reasoning are at best an unnecessary addition to arguments that would be just as valid and rigorous without them. At worst, they can lead to muddled and incorrect thinking based on a false impression of accuracy, rigor, and insight where there is none, and ultimately to numerological pseudoscience.
Also, we still don’t know whether and to what extent various parts of our brains involved in common-sense reasoning approximate Bayesian networks. It may well be that some, or even all of them do, but the problem is that we cannot look at them and calculate the exact probabilities involved, and these are not available to introspection. The fallacy of radical Bayesianism that is often seen on LW is in the assumption that one can somehow work around this problem so as to meaningfully attach an explicit Bayesian procedure and a numerical probability to each judgment one makes.
Note also that even if my case turns out to be significantly weaker under scrutiny, it may still be a valid counterargument to the frequently voiced position that one can, and should, attach a numerical probability to every judgment one makes.
So, that would be a statement of my position; I’m looking forward to any comments.
Suppose you have two studies, each of which measures and gives a probability for the same thing. The first study has a small sample size, and a not terribly rigorous experimental procedure; the second study has a large sample size, and a more thorough procedure. When called on to make a decision, you would use the probability from the larger study. But if the large study hadn’t been conducted, you wouldn’t give up and act like you didn’t have any probability at all; you’d use the one from the small study. You might have to do some extra sanity checks, and your results wouldn’t be as reliable, but they’d still be better than if you didn’t have a probability at all.
A probability assigned by common-sense reasoning is to a probability that came from a small study, as a probability from a small study is to a probability from a large study. The quality of probabilities varies continuously; you get better probabilities by conducting better studies. By saying that a probability based only on common-sense reasoning is meaningless, I think what you’re really trying to do is set a minimum quality level. Since probabilities that’re based on studies and calculation are generally better than probabilities that aren’t, this is a useful heuristic. However, it is only that, a heuristic; probabilities based on common-sense reasoning can sometimes be quite good, and they are often the only information available anywhere (and they are, therefore, the best information). Not all common-sense-based probabilities are equal; if an expert thinks for an hour and then gives a probability, without doing any calculation, then that probability will be much better than if a layman thinks about it for thirty seconds. The best common-sense probabilities are better than the worst statistical-study probabilities; and besides, there usually aren’t any relevant statistical calculations or studies to compare against.
I think what’s confusing you is an intuition that if someone gives a probability, you should be able to take it as-is and start calculating with it. But suppose you had collected five large studies, and someone gave you the results of a sixth. You wouldn’t take that probability as-is, you’d have to combine it with the other five studies somehow. You would only use the new probability as-is if it was significantly better (larger sample, more trustworthy procedure, etc) than the ones you already had, or you didn’t have any before. Now if there are no good studies, and someone gives you a probability that came from their common-sense reasoning, you almost certainly have a comparably good probability already: your own common-sense reasoning. So you have to combine it. So in a sense, those sorts of probabilities are less meaningful—you discard them when they compete with better probabilities, or at least weight them less—but there’s still a nonzero amount of meaning there.
(Aside: I’ve been stuck for awhile on an article I’m writing called “What Probability Requires”, dealing with this same topic, and seeing you argue the other side has been extremely helpful. I think I’m unstuck now; thank you for that.)
After thinking about your comment, I think this observation comes close to the core of our disagreement:
By saying that a probability based only on common-sense reasoning is meaningless, I think what you’re really trying to do is set a minimum quality level.
Basically, yes. More specifically, the quality level I wish to set is that the numbers must give more useful information than mere verbal expressions of confidence. Otherwise, their use at best simply adds nothing useful, and at worst leads to fallacious reasoning encouraged by a false feeling of accuracy.
Now, there are several possible ways to object my position:
The first is to note that even if not meaningful mathematically, numbers can serve as communication-facilitating figures of speech. I have conceded this point.
The second way is to insist on an absolute principle that one should always attach numerical probabilities to one’s beliefs. I haven’t seen anything in this thread (or elsewhere) yet that would shake my belief in the fallaciousness of this position, or even provide any plausible-seeming argument in favor of it.
The third way is to agree that sometimes attaching numerical probabilities to common-sense judgments makes no sense, but on the other hand, in some cases common-sense reasoning can produce numerical probabilities that will give more useful information than just fuzzy words. After the discussion with mattnewport and others, I agree that there are such cases, but I still maintain that these are rare exceptions. (In my original statement, I took an overly restrictive notion of “common sense”; I admit that in some cases, thinking that could be reasonably called like that is indeed precise enough to produce meaningful numerical probabilities.)
So, to clarify, which exact position do you take in this regard? Or would your position require a fourth item to summarize fairly?
I think what’s confusing you is an intuition that if someone gives a probability, you should be able to take it as-is and start calculating with it. [...] So in a sense, those sorts of probabilities are less meaningful—you discard them when they compete with better probabilities, or at least weight them less—but there’s still a nonzero amount of meaning there.
I agree that there is a non-zero amount of meaning, but the question is whether it exceeds what a simple verbal statement of confidence would convey. If I can’t take a number and start calculating with it, what good is it? (Except for the caveat about possible metaphorical meanings of numbers.)
My response to this ended up being a whole article, which is why it took so long. The short version of my position is, we should attack numbers to beliefs as often as possible, but for instrumental reasons rather than on principle.
As a matter of fact I can think of one reason—a strong reason in my view—that the consciously felt feeling of certainty is liable to be systematically and significantly exaggerated with respect to the true probability assignment assigned by the person’s mental black box—the latter being something that we might in principle elicit through experimentation by putting the same subject through variants of a given scenario. (Think revealed probability assignment—similar to revealed preference as understood by the economists.)
The reason is that whole-hearted commitment is usually best whatever one chooses to do. Consider Buridan’s ass, but with the following alterations. Instead of hay and water, to make it more symmetrical suppose the ass has two buckets of water, one on either side about equally distant. Suppose furthermore that his mental black box assigns a 51% probability to the proposition that the bucket on the right side is closer to him than the bucket on the left side.
The question, then, is what should the ass consciously feel about the probability that the bucket on the right is closest? I propose that given that his black box assigns a 51% probability to this, he should go to the bucket on the right. But given that he should go to the bucket on the right, he should go there without delay, without a hesitating step, because hesitation is merely a waste of time. But how can the ass go there without delay if he is consciously feeling that the probability is 51% that the bucket on the right is closest? That feeling will cause within him uncertainty and hesitation and will slow him down. Therefore it is best if the ass consciously is absolutely convinced that the bucket on the right is closest. This conscious feeling of certainty will speed his step and get him to the water quickly.
So it is best for Buridan’s ass that his consciously felt degrees of certainty are great exaggerations of his mental black box’s probability assignments. I think this generalizes. We should consciously feel much more certain of things than we really are, in order to get ourselves moving.
In fact, if Buridan’s ass’s mental black box assigns exactly 50% probability to the right bucket being the closer one, the mental black box should in effect flip a coin and then delude the conscious self to become entirely convinced that the right (or, depending on the coin flip, the left) bucket is the closest and act accordingly.
This can be applied to the reactions of prey to predators. It is so costly for a prey animal to be eaten, and relatively so not very costly for the prey animal merely to waste a bit of its time running, that a prey animal is most likely to survive to reproduce if it is in the habit of completely believing that there is a predator after it far more often than there really is a predator after it. Even if possible-predator-signals in the environment actually signify predators 10% of the time or less, since the prey animal never knows which of those signals is the predator, the prey needs to run for its very life every single time it senses the possible-predator-signal. For it to do this, it must be fully mentally committed to the proposition that there is in fact a predator after it. There is no reason for the prey animal to have any less than full belief that there is a predator after it, each and every time it senses a possible predator.
I don’t agree with this conflation of commitment and belief. I’ve never had to run from a predator, but when I run to catch a train, I am fully committed to catching the train, although I may be uncertain about whether I will succeed. In fact, the less time I have, the faster I must run, but the less likely I am to catch the train. That only affects my decision to run or not. On making the decision, belief and uncertainty are irrelevant, intention and action are everything.
Maybe some people have to make themselves believe in an outcome they know to be uncertain, in order to achieve it, but that is just a psychological exercise, not a necessary part of action.
The question is not whether there are some examples of commitment which do not involve belief. The question is whether there are (some, many) examples where really, absolutely full commitment does involve belief. I think there are many.
Consider what commitment is. If someone says, “you don’t seem fully committed to this”, what sort of thing might have prompted him to say this? It’s something like, he thinks you aren’t doing everything you could possibly do to help this along. He thinks you are holding back.
You might reply to this criticism, “I am not holding anything back. There is literally nothing more that I can do to further the probability of success, so there is no point in doing more—it would be an empty and possibly counterproductive gesture rather than being an action that truly furthers the chance of success.”
So the important question is, what can a creature do to further the probability of success? Let’s look at you running to catch the train. You claim that believing that you will succeed would not further the success of your effort. Well, of course not! I could have told you that! If you believe that you will succeed, you can become complacent, which runs the risk of slowing you down.
But if you believe that there is something chasing you, that is likely to speed you up.
Your argument is essentially, “my full commitment didn’t involve belief X, therefore you’re wrong”. But belief X is a belief that would have slowed you down. It would have reduced, not furthered, your chance of success. So of course your full commitment didn’t involve belief X.
My point is that it is often the case that a certain consciously felt belief would increase a person’s chances of success, given their chosen course of action. And in light of what commitment is—it is commitment of one’s self and one’s resources to furthering the probability of success—then if a belief would further a chance of success, then full, really full commitment will include that belief.
So I am not conflating conscious belief with commitment. I am saying that conscious belief can be, and often is, involved in the furthering of success, and therefore can be and often is a part of really full commitment. That is no more conflating belief with commitment than saying that a strong fabric makes a good coat conflates fabric with coats.
You’re right that my analogy was inaccurate: what corresponds in the train-catching scenario to believing there is a predator is my belief that I need to catch this train.
My point is that it is often the case that a certain consciously felt belief would increase a person’s chances of success, given their chosen course of action. And in light of what commitment is—it is commitment of one’s self and one’s resources to furthering the probability of success—then if a belief would further a chance of success, then full, really full commitment will include that belief.
A stronger belief may produce stronger commitment, but strong commitment does not require strong belief. The animal either flees or does not, because a half-hearted sprint will have no effect on the outcome whether a predator is there or not. Similarly, there’s no point making a half-hearted jog for a train, regardless of how much or little one values catching it.
Belief and commitment to act on the belief are two different parts of the process.
Of course, a lot of the “success” literature urges people to have faith in themselves, to believe in their mission, to cast all doubt aside, etc., and if a tool works for someone I’ve no urge to tell them it shouldn’t. But, personally, I take Yoda’s attitude: “Do, or do not.”
Yoda tutors Luke in Jedi philosophy and a practice, which it will take Luke a while to learn. In the meantime, however, Luke is merely an unpolished human. And I am not here recommending a particular philosophy and practice of thought and behavior, but making a prediction about how unpolished humans (and animals) are likely to act. My point is not to recommend that Buridan’s ass should have an exaggerated confidence that the right bucket is closer, but to observe that we can expect him to have an exaggerated confidence, because, for reasons I described, exaggerated confidence is likely to have been selected for because it is likely to have improved the chances of survival of asses who did not have the benefit of Yoda’s instruction.
So I don’t recommend, rather I expect that humans will commonly have conscious feelings of confidence which are exaggerated, and which do not truly reflect the output of the human’s mental black box, his mental machinery to which he does not have access.
Let me explain by the way what I mean here, because I’m saying that the black box can output a 51% probability for Proposition P while at the same time causing the person to be consciously absolutely convinced of the truth of P. This may be confusing, because I seem to be saying that the black box outputs two probabilities, a 51% probability for purposes of decisionmaking and a 100% probability for conscious consumption. So let me explain with an example what I mean.
Suppose you want to test Buridan’s ass to see what probability he assigns to the proposition that the right bucket is closer. What you can do is take the scenario and alter as follows: introduce a mechanism which, with 4% probability, will move the right bucket further than the left bucket before Buridan’s ass gets to it.
Now, if Buridan’s ass assigns a 100% probability that the right bucket is (currently) closer than the left bucket, then taking into account the introduced mechanism, this yields a 96% probability that, by the time the ass gets to it, the right bucket will still be closer to the ass’s starting position. But if Buridan’s ass assigns a 51% probability that the right bucket is (currently) closer than the left bucket, then taking into account the mechanism, this yields approximately a 49% probability (assuming I did the numbers right) that by the time the ass gets to it, the right bucket will be closer.
I am, of course, assuming that the ass is smart enough to understand and incorporate the mechanism into his calculations. Animals have eyes and ears and brains for a reason, so I don’t think it’s a stretch to suppose that there is some way to implement this scenario in a way that an ass really could understand.
So here’s how the test works. You observe that the ass goes to the bucket on the right. You are not sure whether the ass has assigned a 51% probability or a 100% probability to the right bucket being nearer. So you redo the experiment with the added mechanism. If the ass now (with the introduced mechanism) now goes to the bucket on the left, then you can infer that the ass now believes that the probability that the right bucket will be closer by the time he reaches it is less than 50%. But it only changed by a few percentage points as a result of the added mechanism. Therefore he must have assigned only slightly more than 50% probability to it to begin with.
And in this sort of way, you can elicit the ass’s probability assignments.
The ass’s conscious state of mind, however, is something completely separate from this. If we grant the ass the gift of speech, the ass may well say, each time, “there’s not a shred of doubt in my mind that the right bucket is closer”, or “I am entirely confident that the left bucket is closer”.
My point being that we may well be like the ass, and introspective examination of our own conscious state of mind may fail to reveal the actual probabilities that our mental black boxes have assigned to events. It may instead reveal only overconfident delusions that the black box has instilled in the conscious mind for the purpose of encouraging quick action.
Thanks for the lengthy answer. Still, why it is impossible to calibrate people in general, looking at how often they get the anwer right, and then using them as a device for measuring probabilities? If a person is right on approximately 80% of the issues he says he’s “sure”, then why not translating his next “sure” into an 80% probability? Doesn’t seem arbitrary to me. There may be inconsistency between measurements using different people, but strictly speaking, the thermometers and clocks also sometimes disagree.
I do discuss this exact point in the above lengthy comment, and I allow for this possibility. Here is the relevant part:
The first possible path towards accurate calibration is when the same person performs essentially the same judgment many times, and from the past performance we extract the frequency with which their brain tends to produce the right answer. If this level of accuracy remains roughly constant in time, then it makes sense to attach it as the probability to that person’s future judgments on the topic. This approach treats the relevant operations in the brain as a black box whose behavior, being roughly constant, can be subjected to such extrapolation.
Now clearly, the critical part is to ensure that the future judgments are based on the same parts of the person’s brain and that the relevant features of these parts, as well as the problem being solved, remain unchanged. In practice, these requirements can be satisfied by people who have reached the peak of ability achievable by learning from experience in solving some problem that repeatedly occurs in nearly identical form. Still, even in the best case, we’re talking about a very limited number of questions and people here.
I know you have limited it to repeated judgments about essentialy the same question. I was rather asking why, and I am still not sure whether I interpret it correctly. Is it that the judgments themselves are possibly produced by different parts of brain, or the person’s self-evaluation of certainty are produced by different parts of brain, or both? And if so, so what?
Imagine a test is done on a particular person. During five consecutive years he is being asked a lot of questions (of all different types), and he has to give an answer and a subjective feeling of certainty. After that, we see that the answers which he has labeled as “almost certain” were right in 83%, 78%, 81%, 84% and 85% of cases in the five years. Let’s even say that the experimenters were careful enough to divide the questions into different topics, and establish, that his “almost certain” anwers about medicine were right in 94% of the time in average and his “almost certain” answers about politics were right in 56% of the time in average. All other topics were near the overall average.
Do you 1) maintain that such stable results are very unlikely to happen, or that 2) even if most of people can be calibrated is such way, still it doesn’t justify using them for measuring probabilities?
I know you have limited it to repeated judgments about essentialy the same question. I was rather asking why, and I am still not sure whether I interpret it correctly. Is it that the judgments themselves are possibly produced by different parts of brain, or the person’s self-evaluation of certainty are produced by different parts of brain, or both? And if so, so what?
We don’t really know, but it could certainly be both, and also it may well be that the same parts of the brain are not equally reliable for all questions they are capable of processing. Therefore, while simple inductive reasoning tells us that consistent accuracy on the same problem can be extrapolated, there is no ground to generalize to other questions, since they may involve different parts of the brain, or the same part functioning in different modes that don’t have the same accuracy.
Unless, of course, we cover all such various parts and modes and obtain some sort of weighted average over them, which I suppose is the point of your thought experiment, of which more below.
Do you 1) maintain that such stable results are very unlikely to happen, or that 2) even if most of people can be calibrated is such way, still it doesn’t justify using them for measuring probabilities?
If the set of questions remains representative—in the sense of querying the same brain processes with the same frequency—the results could turn out to be fairly stable. This could conceivably be achieved by large and wide-ranging sets of questions. (I wonder if someone has actually done such experiments?)
However, the result could be replicated only if the same person is again asked similar large sets of questions that are representative with regards to the frequencies with which they query different brain processes. Relative to that reference class, it clearly makes sense to attach probabilities to answers, so, yes, here we would have another counterexample for my original claim, for another peculiar meaning of probabilities.
The trouble is that these probabilities would be useless for any purpose that doesn’t involve another similar representative set of questions. In particular, sets of questions about some particular topic that is not representative would presumably not replicate them, and thus they would be a very bad guide for betting that is limited to some particular topic (as it nearly always is). Thus, this seems like an interesting theoretical exercise, but not a way to obtain practically useful numbers.
(I should add that I never thought about this scenario before, so my reasoning here might be wrong.)
If there are any experimental psychologist reading this, maybe they can organise the experiment. I am curious whether people indeed can be calibrated on general questions.
I have read most of the responses and still am not sure whether to upvote or not. I doubt among several (possibly overlapping) interpretations of your statement. Could you tell to what extent the following interpretations really reflect what you think?
Confession of frequentism. Only sensible numerical probabilities are those related to frequencies, i.e. either frequencies of outcomes of repeated experiments, or probabilities derived from there. (Creative drawing of reference-class boundaries may be permitted.) Especially, prior probabilities are meaningless.
Any sensible numbers must be produced using procedures that ultimately don’t include any numerical parameters (maybe except small integers like 2,3,4). Any number which isn’t a result of such a procedure is labeled arbitrary, and therefore meaningless. (Observation and measurement, of course, do count as permitted procedures. Admittedly arbitrary steps, like choosing units of measurement, are also permitted.)
Degrees of confidence shall be expressed without reflexive thinking about them. Trying to establish a fixed scale of confidence levels (like impossible—very unlikely—unlikely—possible—likely—very likely—almost certain—certain), or actively trying to compare degrees of confidence in different beliefs is cheating, since such scales can be then converted into numbers using a non-numerical procedure.
The question of whether somebody is well calibrated is confused for some reason. Calibrating people has no sense. Although we may take the “almost certain” statements of a person and look at how often they are true, the resulting frequency has no sense for some reason.
Unlike #3, beliefs can be ordered or classified on some scale (possibly imprecisely), but assigning numerical values brings confusing connotations and should be avoided. Alternatively said, the meaning of subjective probabilities is preserved after monotonous rescaling.
Although, strictly speaking, human reasoning can be modelled as a Bayesian network where beliefs have numerical strengths, human introspection is poor at assessing their values. Declared values more likely depend on anchoring than on the real strength of the belief. Speaking about numbers actually introduces noise into reasoning.
Human reasoning cannot be modelled by Bayesian inference, not even in approximation.
That’s an excellent list of questions! It will help me greatly to systematize my thinking on the topic.
Before replying to the specific items you list, perhaps I should first state the general position I’m coming from, which motivates me to get into discussions of this sort. Namely, it is my firm belief that when we look at the present state of human knowledge, one of the principal sources of confusion, nonsense, and pseudosicence is physics envy, which leads people in all sorts of fields to construct nonsensical edifices of numerology and then pretend, consciously or not, that they’ve reached some sort of exact scientific insight. Therefore, I believe that whenever one encounters people talking about numbers of any sort that look even slightly suspicious, they should be considered guilty until proven otherwise—and this entire business with subjective probability estimates for common-sense beliefs doesn’t come even close to clearing that bar for me.
Now to reply to your list.
My answer to (1) follows from my opinion about (2).
In my view, a number that gives any information about the real world must ultimately refer, either directly or via some calculation, to something that can be measured or counted (at least in principle, perhaps using a thought-experiment). This doesn’t mean that all sensible numbers have to be derived from concrete empirical measurements; they can also follow from common-sense insight and generalization. For example, reading about Newton’s theory leads to the common-sense insight that it’s a very close approximation of reality under certain assumptions. Now, if we look at the gravity formula F=m1*m2/r^2 (in units set so that G=1), the number 2 in the denominator is not a product of any concrete measurement, but a generalization from common sense. Yet what makes it sensible is that it ultimately refers to measurable reality via a well-defined formula: measure the force between two bodies of known masses at distance r, and you’ll get log(m1*m2/F)/log(r) = 2.
Now, what can we make out of probabilities from this viewpoint? I honestly can’t think of any sensible non-frequentist answer to this question. Subjectivist Bayesian phrases such as “the degree of belief” sound to me entirely ghostlike unless this “degree” is verifiable via some frequentist practical test, at least in principle. In this sense, I do confess frequentism. (Though I don’t wish to subscribe to all the related baggage from various controversies in statistics, much of which is frankly over my head.)
That depends on the concrete problem under consideration, and on the thinker who is considering it. The thinker’s brain produces an answer alongside a more or less fuzzy feeling of confidence, and the human language has the capacity to express these feelings with about the same level of fuziness as that signal. It can be sensible to compare intuitive confidence levels, if such comparison can be put to a practical (i.e. frequentist) test. Eight ordered intuitive levels of certainty might perhaps be too much, but with, say, four levels, I could produce four lists of predictions labeled “almost impossible,” “unlikely,” “likely,” and “almost certain,” such that common-sense would tell us that, with near-certainty, those in each subsequent list would turn out to be true in ever greater proportion.
If I wish to express these probabilities as numbers, however, this is not a legitimate step unless the resulting numbers can be justified in the sense discussed above under (1) and (2). This requires justification both in the sense of defining what aspect of reality they refer to (where frequentism seems like the only answer), and guaranteeing that they will be accurate under empirical tests. If they can be so justified, then we say that the intuitive estimate is “well-calibrated.” However, calibration is usually not possible in practice, and there are only two major exceptions.
The first possible path towards accurate calibration is when the same person performs essentially the same judgment many times, and from the past performance we extract the frequency with which their brain tends to produce the right answer. If this level of accuracy remains roughly constant in time, then it makes sense to attach it as the probability to that person’s future judgments on the topic. This approach treats the relevant operations in the brain as a black box whose behavior, being roughly constant, can be subjected to such extrapolation.
The second possible path is reached when someone has a sufficient level of insight about some problem to cross the fuzzy limit between common-sense thinking and an actual scientific model. Increasingly subtle and accurate thinking about a problem can result in the construction of a mathematical model that approximates reality well enough that when applied in a shut-up-and-calculate way, it yields probability estimates that will be subsequently vindicated empirically.
(Still, deciding whether the model is applicable in some particular situation remains a common-sense problem, and the probabilities yielded by the model do not capture this uncertainty. If a well-established physical theory, applied by competent people, says that p=0.9999 for some event, common sense tells me that I should treat this event as near-certain—and, if repeated many times, that it will come out the unlikely way very close to one in 10,000 times. On the other hand, if p=0.9999 is produced by some suspicious model that looks like it might be a product of data-dredging rather than real insight about reality, common sense tells me that the event is not at all certain. But there is no way to capture this intuitive uncertainty with a sensible number. The probabilities coming from calibration of repeated judgment are subject to analogous unquantifiable uncertainty.)
There is also a third logical possibility, namely that some people in some situations have precise enough intuitions of certaintly that they can quantify them in an accurate way, just like some people can guess what time it is with remarkable precision without looking at the clock. But I see little evidence of this occurring in reality, and even if it does, these are very rare special cases.
I disagree with this, as explained above. Calibration can be done successfully in the special cases I mentioned. However, in cases where it cannot be done, which includes the great majority of the actual beliefs and conclusions made by human brains, devising numerical probabilities makes no sense.
This should be clear from the answer to (3).
[Continued in a separate comment below due to excessive length.]
I’ll point out here that reversed stupidity is not intelligence, and that for every possible error, there is an opposite possible error.
In my view, if someone’s numbers are wrong, that should be dealt with on the object level (e.g. “0.001 is too low”, with arguments for why), rather than retreating to the meta level of “using numbers caused you to err”. The perspective I come from is wanting to avoid the opposite problem, where being vague about one’s beliefs allows one to get away without subjecting them to rigorous scrutiny. (This, too, by the way, is a major hallmark of pseudoscience.)
But I’ll note that even as we continue to argue under opposing rhetorical banners, our disagreement on the practical issue seems to have mostly evaporated; see here for instance. You also do admit in the end that fear of poor calibration is what is underlying your discomfort with numerical probabilities:
As a theoretical matter, I disagree completely with the notion that probabilities are not legitimate or meaningful unless they’re well-calibrated. There is such a thing as a poorly-calibrated Bayesian; it’s a perfectly coherent concept. The Bayesian view of probabilities is that they refer specifically to degrees of belief, and not anything else. We would of course like the beliefs so represented to be as accurate as possible; but they may not be in practice.
If my internal “Bayesian calculator” believes P(X) = 0.001, and X turns out to be true, I’m not made less wrong by having concealed the number, saying “I don’t think X is true” instead. Less embarrassed, perhaps, but not less wrong.
komponisto:
Trouble is, sometimes numbers can be not even wrong, with their very definition lacking logical consistency or any defensible link with reality. It is that category that I am most concerned with, and I believe that it sadly occurs very often in practice, with entire fields of inquiry sometimes degenerating into meaningless games with such numbers. My honest impression is that in our day and age, such numerological fallacies have been responsible for much greater intellectual sins than the opposite fallacy of avoiding scrutiny by excessive vagueness, although the latter phenomenon is not negligible either.
Here we seem to be clashing about terminology. I think that “poor calibration” is too much of a euphemism for the situations I have in mind, namely those where sensible calibration is altogether impossible. I would instead use some stronger expression clarifying that the supposed “calibration” is done without any valid basis, not that the result is poor because some unfortunate circumstance occurred in the course of an otherwise sensible procedure.
As I explained in the above lengthy comment, I simply don’t find numbers that “refer specifically to degrees of belief, and not anything else” a coherent concept. We seem to be working with fundamentally different philosophical premises here.
Can these numerical “degrees of belief” somehow be linked to observable reality according to the criteria I defined in my reply to the points (1)-(2) above? If not, I don’t see how admitting such concepts can be of any use.
But if you do this 10,000 times, and the number of times X turns out to be true is small but nowhere close to 10, you are much more wrong than if you had just been saying “X is highly unlikely” all along.
On the other hand, if we’re observing X as a single event in isolation, I don’t see how this tests your probability estimate in any way. But I suspect we have some additional philosophical differences here.
[Continued from the parent comment.]
I have revised my view about this somewhat thanks to a shrewd comment by xv15. The use of unjustified numerical probabilities can sometimes be a useful figure of speech that will convey an intuitive feeling of certainty to other people more faithfully than verbal expressions. But the important thing to note here is that the numbers in such situations are mere figures of speech, i.e. expressions that exploit various idiosyncrasies of human language and thinking to transmit hard-to-convey intuitive points via non-literal meanings. It is not legitimate to use these numbers for any other purpose.
Otherwise, I agree. Except in the above-discussed cases, subjective probabilities extracted from common-sense reasoning are at best an unnecessary addition to arguments that would be just as valid and rigorous without them. At worst, they can lead to muddled and incorrect thinking based on a false impression of accuracy, rigor, and insight where there is none, and ultimately to numerological pseudoscience.
Also, we still don’t know whether and to what extent various parts of our brains involved in common-sense reasoning approximate Bayesian networks. It may well be that some, or even all of them do, but the problem is that we cannot look at them and calculate the exact probabilities involved, and these are not available to introspection. The fallacy of radical Bayesianism that is often seen on LW is in the assumption that one can somehow work around this problem so as to meaningfully attach an explicit Bayesian procedure and a numerical probability to each judgment one makes.
Note also that even if my case turns out to be significantly weaker under scrutiny, it may still be a valid counterargument to the frequently voiced position that one can, and should, attach a numerical probability to every judgment one makes.
So, that would be a statement of my position; I’m looking forward to any comments.
Suppose you have two studies, each of which measures and gives a probability for the same thing. The first study has a small sample size, and a not terribly rigorous experimental procedure; the second study has a large sample size, and a more thorough procedure. When called on to make a decision, you would use the probability from the larger study. But if the large study hadn’t been conducted, you wouldn’t give up and act like you didn’t have any probability at all; you’d use the one from the small study. You might have to do some extra sanity checks, and your results wouldn’t be as reliable, but they’d still be better than if you didn’t have a probability at all.
A probability assigned by common-sense reasoning is to a probability that came from a small study, as a probability from a small study is to a probability from a large study. The quality of probabilities varies continuously; you get better probabilities by conducting better studies. By saying that a probability based only on common-sense reasoning is meaningless, I think what you’re really trying to do is set a minimum quality level. Since probabilities that’re based on studies and calculation are generally better than probabilities that aren’t, this is a useful heuristic. However, it is only that, a heuristic; probabilities based on common-sense reasoning can sometimes be quite good, and they are often the only information available anywhere (and they are, therefore, the best information). Not all common-sense-based probabilities are equal; if an expert thinks for an hour and then gives a probability, without doing any calculation, then that probability will be much better than if a layman thinks about it for thirty seconds. The best common-sense probabilities are better than the worst statistical-study probabilities; and besides, there usually aren’t any relevant statistical calculations or studies to compare against.
I think what’s confusing you is an intuition that if someone gives a probability, you should be able to take it as-is and start calculating with it. But suppose you had collected five large studies, and someone gave you the results of a sixth. You wouldn’t take that probability as-is, you’d have to combine it with the other five studies somehow. You would only use the new probability as-is if it was significantly better (larger sample, more trustworthy procedure, etc) than the ones you already had, or you didn’t have any before. Now if there are no good studies, and someone gives you a probability that came from their common-sense reasoning, you almost certainly have a comparably good probability already: your own common-sense reasoning. So you have to combine it. So in a sense, those sorts of probabilities are less meaningful—you discard them when they compete with better probabilities, or at least weight them less—but there’s still a nonzero amount of meaning there.
(Aside: I’ve been stuck for awhile on an article I’m writing called “What Probability Requires”, dealing with this same topic, and seeing you argue the other side has been extremely helpful. I think I’m unstuck now; thank you for that.)
After thinking about your comment, I think this observation comes close to the core of our disagreement:
Basically, yes. More specifically, the quality level I wish to set is that the numbers must give more useful information than mere verbal expressions of confidence. Otherwise, their use at best simply adds nothing useful, and at worst leads to fallacious reasoning encouraged by a false feeling of accuracy.
Now, there are several possible ways to object my position:
The first is to note that even if not meaningful mathematically, numbers can serve as communication-facilitating figures of speech. I have conceded this point.
The second way is to insist on an absolute principle that one should always attach numerical probabilities to one’s beliefs. I haven’t seen anything in this thread (or elsewhere) yet that would shake my belief in the fallaciousness of this position, or even provide any plausible-seeming argument in favor of it.
The third way is to agree that sometimes attaching numerical probabilities to common-sense judgments makes no sense, but on the other hand, in some cases common-sense reasoning can produce numerical probabilities that will give more useful information than just fuzzy words. After the discussion with mattnewport and others, I agree that there are such cases, but I still maintain that these are rare exceptions. (In my original statement, I took an overly restrictive notion of “common sense”; I admit that in some cases, thinking that could be reasonably called like that is indeed precise enough to produce meaningful numerical probabilities.)
So, to clarify, which exact position do you take in this regard? Or would your position require a fourth item to summarize fairly?
I agree that there is a non-zero amount of meaning, but the question is whether it exceeds what a simple verbal statement of confidence would convey. If I can’t take a number and start calculating with it, what good is it? (Except for the caveat about possible metaphorical meanings of numbers.)
My response to this ended up being a whole article, which is why it took so long. The short version of my position is, we should attack numbers to beliefs as often as possible, but for instrumental reasons rather than on principle.
As a matter of fact I can think of one reason—a strong reason in my view—that the consciously felt feeling of certainty is liable to be systematically and significantly exaggerated with respect to the true probability assignment assigned by the person’s mental black box—the latter being something that we might in principle elicit through experimentation by putting the same subject through variants of a given scenario. (Think revealed probability assignment—similar to revealed preference as understood by the economists.)
The reason is that whole-hearted commitment is usually best whatever one chooses to do. Consider Buridan’s ass, but with the following alterations. Instead of hay and water, to make it more symmetrical suppose the ass has two buckets of water, one on either side about equally distant. Suppose furthermore that his mental black box assigns a 51% probability to the proposition that the bucket on the right side is closer to him than the bucket on the left side.
The question, then, is what should the ass consciously feel about the probability that the bucket on the right is closest? I propose that given that his black box assigns a 51% probability to this, he should go to the bucket on the right. But given that he should go to the bucket on the right, he should go there without delay, without a hesitating step, because hesitation is merely a waste of time. But how can the ass go there without delay if he is consciously feeling that the probability is 51% that the bucket on the right is closest? That feeling will cause within him uncertainty and hesitation and will slow him down. Therefore it is best if the ass consciously is absolutely convinced that the bucket on the right is closest. This conscious feeling of certainty will speed his step and get him to the water quickly.
So it is best for Buridan’s ass that his consciously felt degrees of certainty are great exaggerations of his mental black box’s probability assignments. I think this generalizes. We should consciously feel much more certain of things than we really are, in order to get ourselves moving.
In fact, if Buridan’s ass’s mental black box assigns exactly 50% probability to the right bucket being the closer one, the mental black box should in effect flip a coin and then delude the conscious self to become entirely convinced that the right (or, depending on the coin flip, the left) bucket is the closest and act accordingly.
This can be applied to the reactions of prey to predators. It is so costly for a prey animal to be eaten, and relatively so not very costly for the prey animal merely to waste a bit of its time running, that a prey animal is most likely to survive to reproduce if it is in the habit of completely believing that there is a predator after it far more often than there really is a predator after it. Even if possible-predator-signals in the environment actually signify predators 10% of the time or less, since the prey animal never knows which of those signals is the predator, the prey needs to run for its very life every single time it senses the possible-predator-signal. For it to do this, it must be fully mentally committed to the proposition that there is in fact a predator after it. There is no reason for the prey animal to have any less than full belief that there is a predator after it, each and every time it senses a possible predator.
I don’t agree with this conflation of commitment and belief. I’ve never had to run from a predator, but when I run to catch a train, I am fully committed to catching the train, although I may be uncertain about whether I will succeed. In fact, the less time I have, the faster I must run, but the less likely I am to catch the train. That only affects my decision to run or not. On making the decision, belief and uncertainty are irrelevant, intention and action are everything.
Maybe some people have to make themselves believe in an outcome they know to be uncertain, in order to achieve it, but that is just a psychological exercise, not a necessary part of action.
The question is not whether there are some examples of commitment which do not involve belief. The question is whether there are (some, many) examples where really, absolutely full commitment does involve belief. I think there are many.
Consider what commitment is. If someone says, “you don’t seem fully committed to this”, what sort of thing might have prompted him to say this? It’s something like, he thinks you aren’t doing everything you could possibly do to help this along. He thinks you are holding back.
You might reply to this criticism, “I am not holding anything back. There is literally nothing more that I can do to further the probability of success, so there is no point in doing more—it would be an empty and possibly counterproductive gesture rather than being an action that truly furthers the chance of success.”
So the important question is, what can a creature do to further the probability of success? Let’s look at you running to catch the train. You claim that believing that you will succeed would not further the success of your effort. Well, of course not! I could have told you that! If you believe that you will succeed, you can become complacent, which runs the risk of slowing you down.
But if you believe that there is something chasing you, that is likely to speed you up.
Your argument is essentially, “my full commitment didn’t involve belief X, therefore you’re wrong”. But belief X is a belief that would have slowed you down. It would have reduced, not furthered, your chance of success. So of course your full commitment didn’t involve belief X.
My point is that it is often the case that a certain consciously felt belief would increase a person’s chances of success, given their chosen course of action. And in light of what commitment is—it is commitment of one’s self and one’s resources to furthering the probability of success—then if a belief would further a chance of success, then full, really full commitment will include that belief.
So I am not conflating conscious belief with commitment. I am saying that conscious belief can be, and often is, involved in the furthering of success, and therefore can be and often is a part of really full commitment. That is no more conflating belief with commitment than saying that a strong fabric makes a good coat conflates fabric with coats.
You’re right that my analogy was inaccurate: what corresponds in the train-catching scenario to believing there is a predator is my belief that I need to catch this train.
A stronger belief may produce stronger commitment, but strong commitment does not require strong belief. The animal either flees or does not, because a half-hearted sprint will have no effect on the outcome whether a predator is there or not. Similarly, there’s no point making a half-hearted jog for a train, regardless of how much or little one values catching it.
Belief and commitment to act on the belief are two different parts of the process.
Of course, a lot of the “success” literature urges people to have faith in themselves, to believe in their mission, to cast all doubt aside, etc., and if a tool works for someone I’ve no urge to tell them it shouldn’t. But, personally, I take Yoda’s attitude: “Do, or do not.”
Yoda tutors Luke in Jedi philosophy and a practice, which it will take Luke a while to learn. In the meantime, however, Luke is merely an unpolished human. And I am not here recommending a particular philosophy and practice of thought and behavior, but making a prediction about how unpolished humans (and animals) are likely to act. My point is not to recommend that Buridan’s ass should have an exaggerated confidence that the right bucket is closer, but to observe that we can expect him to have an exaggerated confidence, because, for reasons I described, exaggerated confidence is likely to have been selected for because it is likely to have improved the chances of survival of asses who did not have the benefit of Yoda’s instruction.
So I don’t recommend, rather I expect that humans will commonly have conscious feelings of confidence which are exaggerated, and which do not truly reflect the output of the human’s mental black box, his mental machinery to which he does not have access.
Let me explain by the way what I mean here, because I’m saying that the black box can output a 51% probability for Proposition P while at the same time causing the person to be consciously absolutely convinced of the truth of P. This may be confusing, because I seem to be saying that the black box outputs two probabilities, a 51% probability for purposes of decisionmaking and a 100% probability for conscious consumption. So let me explain with an example what I mean.
Suppose you want to test Buridan’s ass to see what probability he assigns to the proposition that the right bucket is closer. What you can do is take the scenario and alter as follows: introduce a mechanism which, with 4% probability, will move the right bucket further than the left bucket before Buridan’s ass gets to it.
Now, if Buridan’s ass assigns a 100% probability that the right bucket is (currently) closer than the left bucket, then taking into account the introduced mechanism, this yields a 96% probability that, by the time the ass gets to it, the right bucket will still be closer to the ass’s starting position. But if Buridan’s ass assigns a 51% probability that the right bucket is (currently) closer than the left bucket, then taking into account the mechanism, this yields approximately a 49% probability (assuming I did the numbers right) that by the time the ass gets to it, the right bucket will be closer.
I am, of course, assuming that the ass is smart enough to understand and incorporate the mechanism into his calculations. Animals have eyes and ears and brains for a reason, so I don’t think it’s a stretch to suppose that there is some way to implement this scenario in a way that an ass really could understand.
So here’s how the test works. You observe that the ass goes to the bucket on the right. You are not sure whether the ass has assigned a 51% probability or a 100% probability to the right bucket being nearer. So you redo the experiment with the added mechanism. If the ass now (with the introduced mechanism) now goes to the bucket on the left, then you can infer that the ass now believes that the probability that the right bucket will be closer by the time he reaches it is less than 50%. But it only changed by a few percentage points as a result of the added mechanism. Therefore he must have assigned only slightly more than 50% probability to it to begin with.
And in this sort of way, you can elicit the ass’s probability assignments.
The ass’s conscious state of mind, however, is something completely separate from this. If we grant the ass the gift of speech, the ass may well say, each time, “there’s not a shred of doubt in my mind that the right bucket is closer”, or “I am entirely confident that the left bucket is closer”.
My point being that we may well be like the ass, and introspective examination of our own conscious state of mind may fail to reveal the actual probabilities that our mental black boxes have assigned to events. It may instead reveal only overconfident delusions that the black box has instilled in the conscious mind for the purpose of encouraging quick action.
Thanks for the lengthy answer. Still, why it is impossible to calibrate people in general, looking at how often they get the anwer right, and then using them as a device for measuring probabilities? If a person is right on approximately 80% of the issues he says he’s “sure”, then why not translating his next “sure” into an 80% probability? Doesn’t seem arbitrary to me. There may be inconsistency between measurements using different people, but strictly speaking, the thermometers and clocks also sometimes disagree.
I do discuss this exact point in the above lengthy comment, and I allow for this possibility. Here is the relevant part:
Now clearly, the critical part is to ensure that the future judgments are based on the same parts of the person’s brain and that the relevant features of these parts, as well as the problem being solved, remain unchanged. In practice, these requirements can be satisfied by people who have reached the peak of ability achievable by learning from experience in solving some problem that repeatedly occurs in nearly identical form. Still, even in the best case, we’re talking about a very limited number of questions and people here.
I know you have limited it to repeated judgments about essentialy the same question. I was rather asking why, and I am still not sure whether I interpret it correctly. Is it that the judgments themselves are possibly produced by different parts of brain, or the person’s self-evaluation of certainty are produced by different parts of brain, or both? And if so, so what?
Imagine a test is done on a particular person. During five consecutive years he is being asked a lot of questions (of all different types), and he has to give an answer and a subjective feeling of certainty. After that, we see that the answers which he has labeled as “almost certain” were right in 83%, 78%, 81%, 84% and 85% of cases in the five years. Let’s even say that the experimenters were careful enough to divide the questions into different topics, and establish, that his “almost certain” anwers about medicine were right in 94% of the time in average and his “almost certain” answers about politics were right in 56% of the time in average. All other topics were near the overall average.
Do you 1) maintain that such stable results are very unlikely to happen, or that 2) even if most of people can be calibrated is such way, still it doesn’t justify using them for measuring probabilities?
prase:
We don’t really know, but it could certainly be both, and also it may well be that the same parts of the brain are not equally reliable for all questions they are capable of processing. Therefore, while simple inductive reasoning tells us that consistent accuracy on the same problem can be extrapolated, there is no ground to generalize to other questions, since they may involve different parts of the brain, or the same part functioning in different modes that don’t have the same accuracy.
Unless, of course, we cover all such various parts and modes and obtain some sort of weighted average over them, which I suppose is the point of your thought experiment, of which more below.
If the set of questions remains representative—in the sense of querying the same brain processes with the same frequency—the results could turn out to be fairly stable. This could conceivably be achieved by large and wide-ranging sets of questions. (I wonder if someone has actually done such experiments?)
However, the result could be replicated only if the same person is again asked similar large sets of questions that are representative with regards to the frequencies with which they query different brain processes. Relative to that reference class, it clearly makes sense to attach probabilities to answers, so, yes, here we would have another counterexample for my original claim, for another peculiar meaning of probabilities.
The trouble is that these probabilities would be useless for any purpose that doesn’t involve another similar representative set of questions. In particular, sets of questions about some particular topic that is not representative would presumably not replicate them, and thus they would be a very bad guide for betting that is limited to some particular topic (as it nearly always is). Thus, this seems like an interesting theoretical exercise, but not a way to obtain practically useful numbers.
(I should add that I never thought about this scenario before, so my reasoning here might be wrong.)
If there are any experimental psychologist reading this, maybe they can organise the experiment. I am curious whether people indeed can be calibrated on general questions.