It is impossible for A and ~A to both be evidence for B. If a lack of sabotage is evidence for a fifth column, then an actual sabotage event must be evidence against a fifth column.
This is not correct.
One explanation (call it A) for why there fails to be sabotage is that the Fifth Column is trying to be sneaky and inflict maximum damage later on when no one expects it. The probability of that is greater than 0, so it is a legitimate potential explanation for the apparent absence of sabotage. But, on further thought, there is this other possible explanation (call it B): the absence of a Fifth Column will produce an absence of sabotage. The probability of this is also greater than 0.
So here we have the event (Fifth Column exists) constituting evidence for (absence of sabotage) (perhaps the probability is low, but not zero). Surely it is fair to take it for granted that ~(Fifth Column exists) also constitutes evidence for (absence of sabotage). So that’s an example where an event and its negation can potentially both be evidence for something.
I think what you really mean to say is that since P(no sabotage) = P(no sabotage | Fifth Column) P(Fifth Column) + P(no sabotage | no Fifth Column) P(no Fifth Column), and since no sabotage has been observed, making P(no sabotage) = 1, this must imply that P(no sabotage | Fifth Column) P(Fifth Column) = 1 - P(no sabotage | no Fifth Column) P(no Fifth Column).
If we then make the (perhaps unwarranted) assumption that the prior probabilities are equal, i.e. P(Fifth Column) = P(no Fifth Column), then when deciding via a maximum a posterior decision rule which hypothesis to believe, we wind up with P(no sabotage | Fifth Column) = 1 - P(no sabotage | no Fifth Column), and thus we simply select the hypothesis corresponding to whichever conditional probability is larger… and from this, our intuitions about basic logic would say it doesn’t make sense to assign probabilities in such a way that (no Fifth Column) is less likely to cause no sabotage than (Fifth Column), and this is what creates the effect you are noting that some event A and its negation ~A shouldn’t both be evidence for the same thing.
Compactly, it’s only fair to claim that A and ~A cannot both be evidence for B in some very special situations. In general, though, A and ~A definitely can both serve as supporting evidence for B, it’s just that they will corroborate B to different degrees and this may or may not be further offset by the prior distributions of A and ~A.
But it is important not to assert the incorrect generalization that “It is impossible for A and ~A to both be evidence for B.”
Did you take the other replies to Tom McCabe’s comment, which raise the same question you do but offer the opposite answer, into consideration? The appeal to intuition that a fifth column might be refraining from sabotage in order to create more effective sabotage later does not let you take both A and ~A as evidence for B. Any way you verbally justify it, you will still be dutch-bookable and incoherent.
Without losing the generality of the theorems of probability, let me address your particular narrative: If you believe that, if a fifth column exists, it is of the type that will assuredly refrain from sabotage now in order to prepare a more devastating strike later; then observing sabotage (or no sabotage) cannot alter your probability that a fifth column exists.
Without losing the generality of the theorems of probability, let me address your particular narrative: If you believe that, if a fifth column exists, it is of the type that will assuredly refrain from sabotage now in order to prepare a more devastating strike later;
This is a fancy way of saying that if you assume that the fifth column’s intent is totally independent of the observance of sabotage. P(A | B ) = P(A). That is, no evidence can update your position along the lines of Bayes’ theorem.
This is not what I am saying. I am saying that P(A |B) and P(A | ~B) can both be nonzero, and in the Bayesian sense this is what is meant by evidence. Either observing sabotage or failing to observe sabotage can, strictly speaking, corroborate the belief that there is a secret Fifth Column. If you make the further assumption that the actions of the Fifth Column are independent from your observations about sabotage, then yes, everything you said is correct.
My only point is that, in general, you cannot say that it is a rule of probability that A and ~A cannot both be evidence for B. You must be talking about specific assumptions involving independence for that to hold.
It also makes sense to think orthogonally about A and ~A in the following sense: if these are my only two hypotheses, then if there is any best decision, it is because under some decision rule, either A or ~A maximizes the a posteriori probability, but not both. If the posterior was equi-probable (50/50) for the hypotheses, then observing or not observing sabotage would change nothing. This could happen if you make the independence assumption above, but even if you don’t, it could still happen that the priors and conditional probabilities just work out to that particular case, and there would be no optimal belief in the Bayesian sense.
For a concrete example, suppose I flip a coin and if it is Heads, I will eat a tuna sandwich with probability 3⁄4 and a chicken sandwich with probability 1⁄4, and if it is Tails I will eat a turkey sandwich with probability 3⁄4 and a chicken sandwich with probability 1⁄4. Now suppose you only get to see what sandwich I select and then must make your best guess about what the coin showed. If I select a chicken sandwich, then you would believe that either Heads or Tails could serve as evidence for this decision. Neither result would be surprising to you (i.e., neither result would change your model) if you learned of it after I selected a chicken sandwich.
In this case, both A and ~A can serve as evidence for chicken, to the tune of 1⁄4 in each case. A is much stronger evidence for tuna, ~A is much stronger evidence for turkey, but both, to some extent, are evidence of chicken.
I’m not disagreeing with your claim about probability theory at all. I’m just saying that we don’t know that Warren made the assumption that his observations about sabotage were independent from the existence of a Fifth Column. For all we know, it was just that he had such a strong prior belief (which may or may not have been rational in itself) that there was a Fifth Column, that even after observing no sabotage, his decision rule was still in favor of belief in the Fifth Column.
It’s not that he mistakenly thought that the Fifth Column would definitely act in one way or the other. It’s just that both no sabotage and sabotage were, to some degree, compatible with his strong prior that there was a Fifth Column… enough so that after converting it to a posterior it didn’t cause him to change his position.
A is evidence for B if P(B|A) > P(B). That is to say, learning A increases your belief in B. It is a fact from probability theory that P(B) = P(B|A)P(A) + P(B|¬A)P(¬A). If P(B|A) > P(B) and P(B|¬A) > P(B) then that says that:
P(B) > P(B)P(A) + P(B)P(¬A)
P(B) > P(B)(P(A) + P(¬A))
P(B) > P(B)
SInce A and ¬A are exhaustive and exclusive (so P(A) + P(¬A) = 1) this is a contradiction.
On the other hand, P(B|A) and P(B|¬A) being nonzero just means both A and ¬A are consistent with B—that is, A and ¬A are not disproofs of B.
You definitions do not match mine, which come from here :
The key data-dependent term Pr(D | M) is a likelihood, and is sometimes called the evidence for model or hypothesis, M; evaluating it correctly is the key to Bayesian model comparison. The evidence is usually the normalizing constant or partition function of another inference, namely the inference of the parameters of model M given the data D.
The evidence for the hypothesis M is Pr(D | M), regardless of whether or not Pr(D) > Pr(D | M), at least according to that page and this statistics book sitting here at my desk (pages 184-186), and perhaps other sources.
If it’s just a war over definitions, then it’s not worth arguing. My point is that it’s misleading to act like that attribute you call ‘consistency’ doesn’t play a role in what could fuel reasoning like Warren’s above. It’s not about independence assumptions or mistakes about what can be evidence (do you really think Warren cared about the technical, Bayesian definition of evidence in his thinking?). It’s about understanding a person’s formation of prior probabilities in addition to the method by which they convert them to posteriors.
You’ve used “evidence” to refer to the probability P(D | M). We’re talking about the colloquial use of “evidence for the hypothesis” meaning an observation that increases the probability of the hypothesis. This is the sense in which we’ve been using “evidence” in the OP.
If you draw 5 balls from an urn, and they’re all red, that’s evidence for the hypothesis that the next ball will be red, and so you conclude that the next one could be red, with a bit more certainty than you had before. If you draw 5 balls from an urn, and they’re blue, that’s evidence against the hypothesis that the next one will be red, so you conclude that the next one is less likely to be red than you thought before.
Your thought processes are wrong by the bayesian proof, however, if every sequence of 5 balls leads you to increase your belief that the next one will be red.
This is essentially what Warren did. If he observed sabotage he would have increased his belief in the existence of a fifth column, and yet, observing no sabotage he also increased his belief in the existence of a fifth column. Clearly, somewhere he’s made a mistake.
I see your point and I think we mostly agree about everything. My only slight extra point is to suggest that perhaps Warren was trying to use his prior beliefs to predict an explanation for absence of sabotage, rather than trying to use absence of sabotage to intensify his prior beliefs. In retrospect, it’s likely that you’re right about Warren and the quote makes it seem that he did, in fact, think that absence of sabotage increased likelihood of Fifth Column. But in general, though, I think a lot of people make a mistake that has more to do with starting out with an unreasonable prior, or making assumptions that their prior belief is independent of observations, than it has to do with a logical fallacy about letting conditioning on both A and ~A increase the probability of B.
This is not correct.
One explanation (call it A) for why there fails to be sabotage is that the Fifth Column is trying to be sneaky and inflict maximum damage later on when no one expects it. The probability of that is greater than 0, so it is a legitimate potential explanation for the apparent absence of sabotage. But, on further thought, there is this other possible explanation (call it B): the absence of a Fifth Column will produce an absence of sabotage. The probability of this is also greater than 0.
So here we have the event (Fifth Column exists) constituting evidence for (absence of sabotage) (perhaps the probability is low, but not zero). Surely it is fair to take it for granted that ~(Fifth Column exists) also constitutes evidence for (absence of sabotage). So that’s an example where an event and its negation can potentially both be evidence for something.
I think what you really mean to say is that since P(no sabotage) = P(no sabotage | Fifth Column) P(Fifth Column) + P(no sabotage | no Fifth Column) P(no Fifth Column), and since no sabotage has been observed, making P(no sabotage) = 1, this must imply that P(no sabotage | Fifth Column) P(Fifth Column) = 1 - P(no sabotage | no Fifth Column) P(no Fifth Column).
If we then make the (perhaps unwarranted) assumption that the prior probabilities are equal, i.e. P(Fifth Column) = P(no Fifth Column), then when deciding via a maximum a posterior decision rule which hypothesis to believe, we wind up with P(no sabotage | Fifth Column) = 1 - P(no sabotage | no Fifth Column), and thus we simply select the hypothesis corresponding to whichever conditional probability is larger… and from this, our intuitions about basic logic would say it doesn’t make sense to assign probabilities in such a way that (no Fifth Column) is less likely to cause no sabotage than (Fifth Column), and this is what creates the effect you are noting that some event A and its negation ~A shouldn’t both be evidence for the same thing.
Compactly, it’s only fair to claim that A and ~A cannot both be evidence for B in some very special situations. In general, though, A and ~A definitely can both serve as supporting evidence for B, it’s just that they will corroborate B to different degrees and this may or may not be further offset by the prior distributions of A and ~A.
But it is important not to assert the incorrect generalization that “It is impossible for A and ~A to both be evidence for B.”
Did you take the other replies to Tom McCabe’s comment, which raise the same question you do but offer the opposite answer, into consideration? The appeal to intuition that a fifth column might be refraining from sabotage in order to create more effective sabotage later does not let you take both A and ~A as evidence for B. Any way you verbally justify it, you will still be dutch-bookable and incoherent.
Without losing the generality of the theorems of probability, let me address your particular narrative: If you believe that, if a fifth column exists, it is of the type that will assuredly refrain from sabotage now in order to prepare a more devastating strike later; then observing sabotage (or no sabotage) cannot alter your probability that a fifth column exists.
This is a fancy way of saying that if you assume that the fifth column’s intent is totally independent of the observance of sabotage. P(A | B ) = P(A). That is, no evidence can update your position along the lines of Bayes’ theorem.
This is not what I am saying. I am saying that P(A |B) and P(A | ~B) can both be nonzero, and in the Bayesian sense this is what is meant by evidence. Either observing sabotage or failing to observe sabotage can, strictly speaking, corroborate the belief that there is a secret Fifth Column. If you make the further assumption that the actions of the Fifth Column are independent from your observations about sabotage, then yes, everything you said is correct.
My only point is that, in general, you cannot say that it is a rule of probability that A and ~A cannot both be evidence for B. You must be talking about specific assumptions involving independence for that to hold.
It also makes sense to think orthogonally about A and ~A in the following sense: if these are my only two hypotheses, then if there is any best decision, it is because under some decision rule, either A or ~A maximizes the a posteriori probability, but not both. If the posterior was equi-probable (50/50) for the hypotheses, then observing or not observing sabotage would change nothing. This could happen if you make the independence assumption above, but even if you don’t, it could still happen that the priors and conditional probabilities just work out to that particular case, and there would be no optimal belief in the Bayesian sense.
For a concrete example, suppose I flip a coin and if it is Heads, I will eat a tuna sandwich with probability 3⁄4 and a chicken sandwich with probability 1⁄4, and if it is Tails I will eat a turkey sandwich with probability 3⁄4 and a chicken sandwich with probability 1⁄4. Now suppose you only get to see what sandwich I select and then must make your best guess about what the coin showed. If I select a chicken sandwich, then you would believe that either Heads or Tails could serve as evidence for this decision. Neither result would be surprising to you (i.e., neither result would change your model) if you learned of it after I selected a chicken sandwich.
In this case, both A and ~A can serve as evidence for chicken, to the tune of 1⁄4 in each case. A is much stronger evidence for tuna, ~A is much stronger evidence for turkey, but both, to some extent, are evidence of chicken.
I’m not disagreeing with your claim about probability theory at all. I’m just saying that we don’t know that Warren made the assumption that his observations about sabotage were independent from the existence of a Fifth Column. For all we know, it was just that he had such a strong prior belief (which may or may not have been rational in itself) that there was a Fifth Column, that even after observing no sabotage, his decision rule was still in favor of belief in the Fifth Column.
It’s not that he mistakenly thought that the Fifth Column would definitely act in one way or the other. It’s just that both no sabotage and sabotage were, to some degree, compatible with his strong prior that there was a Fifth Column… enough so that after converting it to a posterior it didn’t cause him to change his position.
Uh..
A is evidence for B if P(B|A) > P(B). That is to say, learning A increases your belief in B. It is a fact from probability theory that P(B) = P(B|A)P(A) + P(B|¬A)P(¬A). If P(B|A) > P(B) and P(B|¬A) > P(B) then that says that:
P(B) > P(B)P(A) + P(B)P(¬A)
P(B) > P(B)(P(A) + P(¬A))
P(B) > P(B)
SInce A and ¬A are exhaustive and exclusive (so P(A) + P(¬A) = 1) this is a contradiction.
On the other hand, P(B|A) and P(B|¬A) being nonzero just means both A and ¬A are consistent with B—that is, A and ¬A are not disproofs of B.
You definitions do not match mine, which come from here :
The evidence for the hypothesis M is Pr(D | M), regardless of whether or not Pr(D) > Pr(D | M), at least according to that page and this statistics book sitting here at my desk (pages 184-186), and perhaps other sources.
If it’s just a war over definitions, then it’s not worth arguing. My point is that it’s misleading to act like that attribute you call ‘consistency’ doesn’t play a role in what could fuel reasoning like Warren’s above. It’s not about independence assumptions or mistakes about what can be evidence (do you really think Warren cared about the technical, Bayesian definition of evidence in his thinking?). It’s about understanding a person’s formation of prior probabilities in addition to the method by which they convert them to posteriors.
Ah!
You’ve used “evidence” to refer to the probability P(D | M). We’re talking about the colloquial use of “evidence for the hypothesis” meaning an observation that increases the probability of the hypothesis. This is the sense in which we’ve been using “evidence” in the OP.
If you draw 5 balls from an urn, and they’re all red, that’s evidence for the hypothesis that the next ball will be red, and so you conclude that the next one could be red, with a bit more certainty than you had before. If you draw 5 balls from an urn, and they’re blue, that’s evidence against the hypothesis that the next one will be red, so you conclude that the next one is less likely to be red than you thought before.
Your thought processes are wrong by the bayesian proof, however, if every sequence of 5 balls leads you to increase your belief that the next one will be red.
This is essentially what Warren did. If he observed sabotage he would have increased his belief in the existence of a fifth column, and yet, observing no sabotage he also increased his belief in the existence of a fifth column. Clearly, somewhere he’s made a mistake.
I see your point and I think we mostly agree about everything. My only slight extra point is to suggest that perhaps Warren was trying to use his prior beliefs to predict an explanation for absence of sabotage, rather than trying to use absence of sabotage to intensify his prior beliefs. In retrospect, it’s likely that you’re right about Warren and the quote makes it seem that he did, in fact, think that absence of sabotage increased likelihood of Fifth Column. But in general, though, I think a lot of people make a mistake that has more to do with starting out with an unreasonable prior, or making assumptions that their prior belief is independent of observations, than it has to do with a logical fallacy about letting conditioning on both A and ~A increase the probability of B.