The framing of this issue that makes the most sense to me is ”P(E|B∪C) is a function of P(B):P(C)”.
When I look at it this way, I disagree with the claim (in “Mennen’s ABC example”) that “[Bayesian updating] is not invariant when we aggregate outcomes”—I think it’s clearer to say the Bayesian updating is not well-defined when we aggregate outcomes.
Additionally, in “Interpreting Bayesian Networks”, the framing seems to make it clearer that the problem is that you used e1,2+e1,3 for P(E|B∪C) -- but they’re not the same thing! In essence, you’re taking the sum where you should be taking the average...
With this focus on (mis)calculating P(E|B∪C), the issue seems to me more like “a common error in applying Bayesian updates”, rather than a fundamental paradox in Bayesian updating itself. I agree with the takeaway “be careful when grouping together outcomes of a variable”—because grouping exposes one to committing this error—but I’m not sure I’m seeing the thing that makes you describe it as unintuitive?
Annoyingly and as you point out this is not a perfect summary—we are definitely losing information here and subsequent updates will be not as exact as if we were working with the disaggregated odds.
I still find it quite disturbing that the update after summarizing depends on prior information—but I can’t see how to do better than this, pragmatically speaking.
Right, I agree that for the update aggregation e2+e32 is better than e2+e3 (but still lossy). And the thing that p2:p3 affects is the weighting in the average—so if e2=e3 then the ps don’t matter! (which is a possible answer to your question of “how much aggregation/disaggregation can you do?”)
But yeah if e2 is very different from e3 then I don’t think there’s any way around it, because the effective ei could be one or the other depending on what the pi are.
The framing of this issue that makes the most sense to me is ”P(E|B∪C) is a function of P(B):P(C)”.
When I look at it this way, I disagree with the claim (in “Mennen’s ABC example”) that “[Bayesian updating] is not invariant when we aggregate outcomes”—I think it’s clearer to say the Bayesian updating is not well-defined when we aggregate outcomes.
Additionally, in “Interpreting Bayesian Networks”, the framing seems to make it clearer that the problem is that you used e1,2+e1,3 for P(E|B∪C) -- but they’re not the same thing! In essence, you’re taking the sum where you should be taking the average...
With this focus on (mis)calculating P(E|B∪C), the issue seems to me more like “a common error in applying Bayesian updates”, rather than a fundamental paradox in Bayesian updating itself. I agree with the takeaway “be careful when grouping together outcomes of a variable”—because grouping exposes one to committing this error—but I’m not sure I’m seeing the thing that makes you describe it as unintuitive?
I like this framing.
This seems to imply that summarizing beliefs and summarizing updates are two distinct operations.
For summarizing beliefs we can still resort to summing:
⎛⎜⎝p1p2p3⎞⎟⎠Belief→(p1p2+p3)Summarized belief
But for summarizing updates we need to use an average—which in the absence of prior information will be a simple average:
⎛⎜⎝e1e2e3⎞⎟⎠Update→(e1e2+e32)Summarized update
Annoyingly and as you point out this is not a perfect summary—we are definitely losing information here and subsequent updates will be not as exact as if we were working with the disaggregated odds.
I still find it quite disturbing that the update after summarizing depends on prior information—but I can’t see how to do better than this, pragmatically speaking.
Right, I agree that for the update aggregation e2+e32 is better than e2+e3 (but still lossy). And the thing that p2:p3 affects is the weighting in the average—so if e2=e3 then the ps don’t matter! (which is a possible answer to your question of “how much aggregation/disaggregation can you do?”)
But yeah if e2 is very different from e3 then I don’t think there’s any way around it, because the effective ei could be one or the other depending on what the pi are.