I believe that’s an equivalent way to express “H(X) - H(X|Y) > 0” and “P(A ∩ B) != P(A) * P(B)”. Or at least, any one of the three can be derived from any of the others.
Note that the Kullback-Leibler divergence (a generalization of entropy) between X and Y is equivalent to the number of extra bits required to code data sampled from X when your compression algorithm is optimized for Y, which shows how these all relate.
Correct me if I’m wrong, but would the actual measure of the connection between A and B be more accurately summarized as K(A + B) < K(A) + K(B), then?
I believe that’s an equivalent way to express “H(X) - H(X|Y) > 0” and “P(A ∩ B) != P(A) * P(B)”. Or at least, any one of the three can be derived from any of the others.
Note that the Kullback-Leibler divergence (a generalization of entropy) between X and Y is equivalent to the number of extra bits required to code data sampled from X when your compression algorithm is optimized for Y, which shows how these all relate.