Straight using Bayes’ theorem will result in overconfidence. Bias tends to correlate. If you guessed too high on one probability, it’s likely that you did on another. In addition, the bias will multiply with each piece of evidence. I’d certainly use Bayes’ theorem, but I’d try to correct for overconfidence at the end.
I would strongly encourage folks to adopt the view that we are always “using Bayes’ theorem” when reasoning.
That is, instead of saying “Use Bayes’ theorem, and then [after you’re done using Bayes’ theorem] correct for overconfidence”, say “Update on the evidence of studies showing that overconfidence is common”.
The distinction is important not for the particular result of the calculation, but for stamping out the notion that Bayes’ theorem is a “special trick” that is “sometimes useful”, rather than a mathematical model of inference itself.
I would strongly encourage folks to adopt the view that we are always “using Bayes’ theorem” when reasoning.
This is simply false. As I’m fond of pointing out, often the best judgment you can come up with is produced by entirely opaque processes in your head, whose internals are inaccessible to you no matter how hard you try to introspect on them. Pretending that you can somehow get around this problem and reduce all your reasoning to clear-cut Bayesianism is sheer wishful thinking.
Moreover, even when you are applying exact probabilistic reasoning in evaluating evidence, the numbers you work with often have a common-sense justification that you cannot reduce to Bayesian reasoning in any practically useful way. Knowledge of probability theory will let you avoid errors such as the prosecutor’s fallacy, but this leaves more fundamental underlying questions open. Are the experts who vouch for these forensic methods reliable or just quacks and pseudoscientists? Are the cops and forensic experts presenting real or doctoredevidence, and are they telling the truth or perjuring themselves in cooperation with the prosecution? You can be all happy and proud that you’ve applied the Bayes theorem correctly and avoided the common fallacies, and still your conclusion can be completely remote from reality because the numbers you’ve fed into the formula are a product of quackery, forgery, or perjury—and if you think you know a way to apply Bayesianism to detect these reliably, I would really like to hear it.
Given the context, I interpreted Komponisto’s comment as saying that to the extent that we reason correctly we are using Bayes’ theorem, not that we always reason correctly.
Even if the claim is worded like that, it implies (incorrectly) that correct reasoning should not involve steps based on opaque processes that we are unable to formulate explicitly in Bayesian terms. To take an example that’s especially relevant in this context, assessing people’s honesty, competence, and status is often largely a matter of intuitive judgment, whose internals are as opaque to your conscious introspection as the physics calculations that your brain performs when you’re throwing a ball. If you examine rigorously the justification for the numbers you feed into the Bayes theorem, it will inevitably involve some such intuitive judgment that you can’t justify in Bayesian terms. (You could do that if you had a way of reverse-engineering the relevant algorithms implemented by your brain, of course, but this is still impossible.)
Of course, you can define “reasoning” to refer only to those steps in reaching the conclusion that are performed by rigorous Bayesian inference, and use some other word for the rest. But then to avoid confusion, we should emphasize that reaching any reliable conclusion about the facts in a trial (or almost any other context) requires a whole lot of things other than just “reasoning.”
Even if the claim is worded like that, it implies (incorrectly) that correct reasoning should not involve steps based on opaque processes that we are unable to formulate explicitly in Bayesian terms.
You misunderstand. There was no normative implication intended about explicit formulation. My claim is much weaker than you think (but also abstract enough that it may be difficult to understand how weak it is). I simply assert that Bayesian updating is a mathematical definition of what “inference” means, in the abstract. This does not say anything about the details of how humans process information, and nor does it say anything about how mathematically explicit we “should” be about our reasoning in order for it to be valid. You concede everything you need to in order to agree with me when you write:
You could [justify intuitive judgements in Bayesian terms] if you had a way of reverse-engineering the relevant algorithms implemented by your brain,
In fact, this actually concedes more than necessary—because it could turn out that these algorithms are only approximately Bayesian, and my claim about Bayesianism as the ideal abstract standard would still hold (as indeed implied by the phrase “approximately Bayesian”).
Of course, this does in my view have the implication that it is appropriate for people who understand Bayesian language to use it when discussing their beliefs, especially in the context of a disagreement or other situation where one person’s doesn’t understand the other’s thought process. I suspect this is the real point of controversy here (cf. our previous arguments about using numerical probabilities).
Of course, this does in my view have the implication that it is appropriate for people who understand Bayesian language to use it when discussing their beliefs, especially in the context of a disagreement or other situation where one person’s doesn’t understand the other’s thought process. I suspect this is the real point of controversy here (cf. our previous arguments about using numerical probabilities).
Yes, the reason why I often bring up this point is the danger of spurious exactitude in situations like these. Clearly, if you are able to discuss the situation in Bayesian language while being well aware of the non-Bayesian loose ends involved, that’s great. The problem is that I often observe the tendency to pretend that these loose ends don’t exist. Moreover, the parts of reasoning that are opaque to introspection are typically the most problematic ones, and in most cases, their problems can’t be ameliorated by any formalism, but only on a messy case-by-case heuristic basis. The emphasis on Bayesian formalism detracts from these crucial problems.
If we actually knew how to reason correctly, we could program computers to do it. We reason correctly, better than computers, without understanding how we do it.
The specific example I gave is more due to treating random variables as if they’re independant. For example, you’re as likely to be off either way on A, and you’re as likely to be off either way on B, so for each of those, you in fact gave the correct probability, but you’re more likely to be off the same way on both than the opposite ways, so you have to correct more when you use them together.
The point about the use of the likelihood ratio (to enable us to evaluate the probative value of evidence without having to propose subjective prior probabilities) is something that I am increasingly having grave doubts about. This idea has been oversold by the forensic statistics community. I am currently writing a paper which will show that, in practice, the likelihood ratio as a measure of evidence value can be fundamentally wrong. The example I focus on is the Barry George case. Here is a summary of what the article says:
One way to determine the probative value of any piece of evidence E (such as some forensic match of an item found at the crime scene to an item belonging to the defendant) is to use the likelihood ratio (LR). This is the ratio of two probabilities, namely the probability of E given the prosecution hypothesis (which might be ‘item at crime scene belongs to defendant’) divided by the probability of E given the alternative defence hypothesis (which might be ‘item at crime scene does not belong to defendant’). By Bayes’ theorem, if the LR is greater than 1 then the evidence supports the prosecution hypothesis and if it is less than 1 it supports the defence hypothesis. If the LR is 1, i.e. the probabilities are equal, then the evidence is considered to be ‘neutral’ – it favours neither hypothesis over the other and so offers no probative value. The simple relationship between the LR and the notion of ‘probative value of evidence’ actually only works when the two alternative hypotheses are mutually exclusive and exhaustive (i.e. each is the negation of the other). This is often not clearly stated by proponents of the LR leading to widespread confusion about the notion of value of evidence. In many realistic situations it is extremely difficult to determine suitable hypotheses that are mutually exclusive. Often an LR analysis is performed against hypotheses that are assumed to be mutually exclusive but which are not. In such cases the LR has a much more complex impact on the probative value of evidence than assumed. We show (using Bayes’ theorem and Bayesian networks applied to simple, non-contentious examples) that for sensible alternative hypotheses – which are not exactly mutually exclusive – it is possible to have evidence with an LR of 1 that still has significant probative value. It is also possible to have evidence whose LR strongly favours one hypothesis, but whose probative value strongly favours the alternative hypothesis. We consider the ramifications on the case of Barry George. The successful appeal against his conviction for the murder of Jill Dando was based primarily on the argument that the firearm discharge residue (FDR) evidence that was assumed to support the prosecution hypothesis at the original trial actually had an LR equal to 1 and hence was ‘neutral’. However, our review of the appeal transcript shows numerous inconsistencies and poorly defined hypotheses and evidence such that it is not clear that the relevant elicited probabilities could have been based on mutually exclusive hypotheses. Hence, contrary to the Appeal conclusion, the probative value of the FDR evidence may not have been neutral
Straight using Bayes’ theorem will result in overconfidence. Bias tends to correlate. If you guessed too high on one probability, it’s likely that you did on another. In addition, the bias will multiply with each piece of evidence. I’d certainly use Bayes’ theorem, but I’d try to correct for overconfidence at the end.
I would strongly encourage folks to adopt the view that we are always “using Bayes’ theorem” when reasoning.
That is, instead of saying “Use Bayes’ theorem, and then [after you’re done using Bayes’ theorem] correct for overconfidence”, say “Update on the evidence of studies showing that overconfidence is common”.
The distinction is important not for the particular result of the calculation, but for stamping out the notion that Bayes’ theorem is a “special trick” that is “sometimes useful”, rather than a mathematical model of inference itself.
This is simply false. As I’m fond of pointing out, often the best judgment you can come up with is produced by entirely opaque processes in your head, whose internals are inaccessible to you no matter how hard you try to introspect on them. Pretending that you can somehow get around this problem and reduce all your reasoning to clear-cut Bayesianism is sheer wishful thinking.
Moreover, even when you are applying exact probabilistic reasoning in evaluating evidence, the numbers you work with often have a common-sense justification that you cannot reduce to Bayesian reasoning in any practically useful way. Knowledge of probability theory will let you avoid errors such as the prosecutor’s fallacy, but this leaves more fundamental underlying questions open. Are the experts who vouch for these forensic methods reliable or just quacks and pseudoscientists? Are the cops and forensic experts presenting real or doctored evidence, and are they telling the truth or perjuring themselves in cooperation with the prosecution? You can be all happy and proud that you’ve applied the Bayes theorem correctly and avoided the common fallacies, and still your conclusion can be completely remote from reality because the numbers you’ve fed into the formula are a product of quackery, forgery, or perjury—and if you think you know a way to apply Bayesianism to detect these reliably, I would really like to hear it.
Given the context, I interpreted Komponisto’s comment as saying that to the extent that we reason correctly we are using Bayes’ theorem, not that we always reason correctly.
Even if the claim is worded like that, it implies (incorrectly) that correct reasoning should not involve steps based on opaque processes that we are unable to formulate explicitly in Bayesian terms. To take an example that’s especially relevant in this context, assessing people’s honesty, competence, and status is often largely a matter of intuitive judgment, whose internals are as opaque to your conscious introspection as the physics calculations that your brain performs when you’re throwing a ball. If you examine rigorously the justification for the numbers you feed into the Bayes theorem, it will inevitably involve some such intuitive judgment that you can’t justify in Bayesian terms. (You could do that if you had a way of reverse-engineering the relevant algorithms implemented by your brain, of course, but this is still impossible.)
Of course, you can define “reasoning” to refer only to those steps in reaching the conclusion that are performed by rigorous Bayesian inference, and use some other word for the rest. But then to avoid confusion, we should emphasize that reaching any reliable conclusion about the facts in a trial (or almost any other context) requires a whole lot of things other than just “reasoning.”
You misunderstand. There was no normative implication intended about explicit formulation. My claim is much weaker than you think (but also abstract enough that it may be difficult to understand how weak it is). I simply assert that Bayesian updating is a mathematical definition of what “inference” means, in the abstract. This does not say anything about the details of how humans process information, and nor does it say anything about how mathematically explicit we “should” be about our reasoning in order for it to be valid. You concede everything you need to in order to agree with me when you write:
In fact, this actually concedes more than necessary—because it could turn out that these algorithms are only approximately Bayesian, and my claim about Bayesianism as the ideal abstract standard would still hold (as indeed implied by the phrase “approximately Bayesian”).
Of course, this does in my view have the implication that it is appropriate for people who understand Bayesian language to use it when discussing their beliefs, especially in the context of a disagreement or other situation where one person’s doesn’t understand the other’s thought process. I suspect this is the real point of controversy here (cf. our previous arguments about using numerical probabilities).
Yes, the reason why I often bring up this point is the danger of spurious exactitude in situations like these. Clearly, if you are able to discuss the situation in Bayesian language while being well aware of the non-Bayesian loose ends involved, that’s great. The problem is that I often observe the tendency to pretend that these loose ends don’t exist. Moreover, the parts of reasoning that are opaque to introspection are typically the most problematic ones, and in most cases, their problems can’t be ameliorated by any formalism, but only on a messy case-by-case heuristic basis. The emphasis on Bayesian formalism detracts from these crucial problems.
If we actually knew how to reason correctly, we could program computers to do it. We reason correctly, better than computers, without understanding how we do it.
The specific example I gave is more due to treating random variables as if they’re independant. For example, you’re as likely to be off either way on A, and you’re as likely to be off either way on B, so for each of those, you in fact gave the correct probability, but you’re more likely to be off the same way on both than the opposite ways, so you have to correct more when you use them together.
But yes. Bayes’ theorem is always the answer.
I’m confused. If all that is true, how do you know which direction to correct in?
I have only just come across this discussion (the original article referred to my work). The article
Fenton, N.E. and Neil, M. (2011), ‘Avoiding Legal Fallacies in Practice Using Bayesian Networks’
was published in the Australian Journal of Legal Philosophy 36, 114-151, 2011 (Journal ISSN 1440-4982) A pre-publication pdf can be found here:
https://www.eecs.qmul.ac.uk/~norman/papers/fenton_neil_prob_fallacies_June2011web.pdf
The point about the use of the likelihood ratio (to enable us to evaluate the probative value of evidence without having to propose subjective prior probabilities) is something that I am increasingly having grave doubts about. This idea has been oversold by the forensic statistics community. I am currently writing a paper which will show that, in practice, the likelihood ratio as a measure of evidence value can be fundamentally wrong. The example I focus on is the Barry George case. Here is a summary of what the article says:
One way to determine the probative value of any piece of evidence E (such as some forensic match of an item found at the crime scene to an item belonging to the defendant) is to use the likelihood ratio (LR). This is the ratio of two probabilities, namely the probability of E given the prosecution hypothesis (which might be ‘item at crime scene belongs to defendant’) divided by the probability of E given the alternative defence hypothesis (which might be ‘item at crime scene does not belong to defendant’). By Bayes’ theorem, if the LR is greater than 1 then the evidence supports the prosecution hypothesis and if it is less than 1 it supports the defence hypothesis. If the LR is 1, i.e. the probabilities are equal, then the evidence is considered to be ‘neutral’ – it favours neither hypothesis over the other and so offers no probative value. The simple relationship between the LR and the notion of ‘probative value of evidence’ actually only works when the two alternative hypotheses are mutually exclusive and exhaustive (i.e. each is the negation of the other). This is often not clearly stated by proponents of the LR leading to widespread confusion about the notion of value of evidence. In many realistic situations it is extremely difficult to determine suitable hypotheses that are mutually exclusive. Often an LR analysis is performed against hypotheses that are assumed to be mutually exclusive but which are not. In such cases the LR has a much more complex impact on the probative value of evidence than assumed. We show (using Bayes’ theorem and Bayesian networks applied to simple, non-contentious examples) that for sensible alternative hypotheses – which are not exactly mutually exclusive – it is possible to have evidence with an LR of 1 that still has significant probative value. It is also possible to have evidence whose LR strongly favours one hypothesis, but whose probative value strongly favours the alternative hypothesis. We consider the ramifications on the case of Barry George. The successful appeal against his conviction for the murder of Jill Dando was based primarily on the argument that the firearm discharge residue (FDR) evidence that was assumed to support the prosecution hypothesis at the original trial actually had an LR equal to 1 and hence was ‘neutral’. However, our review of the appeal transcript shows numerous inconsistencies and poorly defined hypotheses and evidence such that it is not clear that the relevant elicited probabilities could have been based on mutually exclusive hypotheses. Hence, contrary to the Appeal conclusion, the probative value of the FDR evidence may not have been neutral