I agree with the OP: simply defining a probability concept doesn’t by itself map it to our intuitions about it. For example, if we defined P(A|B) = P(AB) / 2P(B), it wouldn’t correspond to our intuitions, and here’s why.
Intuitively, P(A|B) is the probability of A happening if we know that B already happened. In other words, the entirety of the elementary outcome space we’re taking into consideration now are those that correspond to B. Of those remaining elementary outcomes, the only ones that can lead to A are those that lie in AB. Their measure in absolute terms is equal to P(AB); however, their measure in relation to the elementary outcomes in B is equal to P(AB)/P(B).
Thus, P(A|B) is P(A) as it would be if the only elementary outcomes in existence were those yielding B. P(B) here is a normalizing coefficient: if we were evaluating the conditional probability of A in relation to a set of exhaustive and mutually exclusive experimental outcomes, as it is done in Bayesian reasoning, dividing by P(B) means renormalizing the elementary outcome space after B is fixed.
Now, a hopefully intuitive explanation of independent events.
By definition, A is independent from B if P(A|B) = P(A), or equivalently P(AB) = P(A)P(B). What does it mean in terms of measures?
It is easy to prove that if A is independent from B, then A is also independent from ~B: P(A|~B) = P(A ~B) / P(~B) = (P(A) - P(AB)) / (1 - P(B)) = (P(A) - P(A)P(B)) / (1 - P(B)) = P(A).
Therefore, A is independent from B iff P(A) = P(AB) / P(B) = P(A ~B) / P(~B), which implies that P(AB) / P(A ~B) = P(B) / P(~B).
Geometrically, it means that A intersects B and ~B with subsets of measures proportionate to the measures of B and ~B. So if P(B) = 1⁄4, then 1⁄4 of A lies in B, and the remaining 3⁄4 in ~B. And if B and ~B are equally likely, then A lies in equal shares of both.
And from an information-theoretic perspective, this geometric interpetation means that knowing whether B or ~B happened gives us no information about the relative likelihood of A, since it will be equally likely to occur in the renormalized outcome space either way.
I feel like independence really is just a definition, or at least something close to it. I guess P(A|B) = P(A|~B) might be better. Independence is just another way of saying that A is just as likely regardless of B.
P(A|B) = P(A|~B) is equivalent to the classic definition of independence, and intuitively it means that “whether B happens or not, it doesn’t affect the likelihood of A happening”.
I guess that since other basic probability concepts are defined in terms of set operations (union and intersection), and independence lacks a similar obvious explanation in terms of sets and measure, I wanted to find one.
I agree with the OP: simply defining a probability concept doesn’t by itself map it to our intuitions about it. For example, if we defined P(A|B) = P(AB) / 2P(B), it wouldn’t correspond to our intuitions, and here’s why.
Intuitively, P(A|B) is the probability of A happening if we know that B already happened. In other words, the entirety of the elementary outcome space we’re taking into consideration now are those that correspond to B. Of those remaining elementary outcomes, the only ones that can lead to A are those that lie in AB. Their measure in absolute terms is equal to P(AB); however, their measure in relation to the elementary outcomes in B is equal to P(AB)/P(B).
Thus, P(A|B) is P(A) as it would be if the only elementary outcomes in existence were those yielding B. P(B) here is a normalizing coefficient: if we were evaluating the conditional probability of A in relation to a set of exhaustive and mutually exclusive experimental outcomes, as it is done in Bayesian reasoning, dividing by P(B) means renormalizing the elementary outcome space after B is fixed.
Basically, P(A|B) = 0 when A and B are disjoint, and P(A|C)/P(B|C) = P(A)/P(B) when A and B are subsets of C?
It’s better, but it’s still not that good. I have a sneaking suspicion that that’s the best I can do, though.
Now, a hopefully intuitive explanation of independent events.
By definition, A is independent from B if P(A|B) = P(A), or equivalently P(AB) = P(A)P(B). What does it mean in terms of measures?
It is easy to prove that if A is independent from B, then A is also independent from ~B: P(A|~B) = P(A ~B) / P(~B) = (P(A) - P(AB)) / (1 - P(B)) = (P(A) - P(A)P(B)) / (1 - P(B)) = P(A).
Therefore, A is independent from B iff P(A) = P(AB) / P(B) = P(A ~B) / P(~B), which implies that P(AB) / P(A ~B) = P(B) / P(~B).
Geometrically, it means that A intersects B and ~B with subsets of measures proportionate to the measures of B and ~B. So if P(B) = 1⁄4, then 1⁄4 of A lies in B, and the remaining 3⁄4 in ~B. And if B and ~B are equally likely, then A lies in equal shares of both.
And from an information-theoretic perspective, this geometric interpetation means that knowing whether B or ~B happened gives us no information about the relative likelihood of A, since it will be equally likely to occur in the renormalized outcome space either way.
I feel like independence really is just a definition, or at least something close to it. I guess P(A|B) = P(A|~B) might be better. Independence is just another way of saying that A is just as likely regardless of B.
P(A|B) = P(A|~B) is equivalent to the classic definition of independence, and intuitively it means that “whether B happens or not, it doesn’t affect the likelihood of A happening”.
I guess that since other basic probability concepts are defined in terms of set operations (union and intersection), and independence lacks a similar obvious explanation in terms of sets and measure, I wanted to find one.
When A is a subset of C, P(A|C) = P(A).
Um, no?
...Oops, yes, said that without thinking. But this
is correct.