Probabilities of zero are extremely load-bearing for natural latents in the exact case, and probabilities near zero are load-bearing in the approximate case; if the distribution is zero nowhere, then it can only have a natural latent if the Xi’s are all independent (in which case the trivial variable is a natural latent).
I’m a bit confused why this is the case. It seems like in the theorems, the only thing “near zero” is that D_KL (joint, factorized) < epsilon ~= 0 . But you. can satisfy this quite easily even with all probabilities > 0.
E.g. the trivial case where all variables are completely independents satisfies all the conditions of your theorem, but can clearly have every pair of probabilities > 0. Even in nontrivial cases, this is pretty easy (e.g. by mixing in irreducible noise with every variable).
Roughly speaking, all variables completely independent is the only way to satisfy all the preconditions without zero-ish probabilities.
This is easiest to see if we use a “strong invariance” condition, in which each of the Xi must mediate between X¯i and Λ. Mental picture: equilibrium gas in a box, in which we can measure roughly the same temperature and pressure (Λ) from any little spatially-localized chunk of the gas (Xi). If I estimate a temperature of 10°C from one little chunk of the gas, then the probability of estimating 20°C from another little chunk must be approximately-zero. The only case where that doesn’t imply near-zero probabilities is when all values of both chunks of gas always imply the same temperature, i.e.Λ only ever takes on one value (and is therefore informationally empty). And in that case, the only way the conditions are satisfied is if the chunks of gas are unconditionally independent.
Hm, it sounds like you’re claiming that if each pair of x, y, z are pairwise independent conditioned on the third variable, and p(x, y, z) =/= 0 for all x, y, z with nonzero p(x), p(y), p(z), then ?
I tried for a bit to show this but couldn’t prove it, let alone the general case without strong invariance. My guess is I’m probably missing something really obvious.
The secret handshake is to start with ”X is independent of Y given Z” and ”X is independent of Z given Y”, expressed in this particular form:
P[X,Y,Z]=P[X|Z]P[Y,Z]=P[X|Y]P[Y,Z]
… then we immediately see that P[X|Z]=P[X|Y] for all X,Y,Z such that P[Y,Z]>0.
So if there are no zero probabilities, then P[X|Z]=P[X|Y] for all X,Y,Z.
That, in turn, implies that P[X|Z] takes on the same value for all Z, which in turn means that it’s equal to P[X]. Thus X and Z are independent. Likewise for X and Y. Finally, we leverage independence of Y and Z given X:
P[X,Y,Z]=P[Y|X]P[Z|X]P[X]
=P[Y]P[Z]P[X]
(A similar argument is in the middle of this post, along with a helpful-to-me visual.)
I’m a bit confused why this is the case. It seems like in the theorems, the only thing “near zero” is that D_KL (joint, factorized) < epsilon ~= 0 . But you. can satisfy this quite easily even with all probabilities > 0.
E.g. the trivial case where all variables are completely independents satisfies all the conditions of your theorem, but can clearly have every pair of probabilities > 0. Even in nontrivial cases, this is pretty easy (e.g. by mixing in irreducible noise with every variable).
Roughly speaking, all variables completely independent is the only way to satisfy all the preconditions without zero-ish probabilities.
This is easiest to see if we use a “strong invariance” condition, in which each of the Xi must mediate between X¯i and Λ. Mental picture: equilibrium gas in a box, in which we can measure roughly the same temperature and pressure (Λ) from any little spatially-localized chunk of the gas (Xi). If I estimate a temperature of 10°C from one little chunk of the gas, then the probability of estimating 20°C from another little chunk must be approximately-zero. The only case where that doesn’t imply near-zero probabilities is when all values of both chunks of gas always imply the same temperature, i.e.Λ only ever takes on one value (and is therefore informationally empty). And in that case, the only way the conditions are satisfied is if the chunks of gas are unconditionally independent.
Hm, it sounds like you’re claiming that if each pair of x, y, z are pairwise independent conditioned on the third variable, and p(x, y, z) =/= 0 for all x, y, z with nonzero p(x), p(y), p(z), then ?
I tried for a bit to show this but couldn’t prove it, let alone the general case without strong invariance. My guess is I’m probably missing something really obvious.
Yeah, that’s right.
The secret handshake is to start with ”X is independent of Y given Z” and ”X is independent of Z given Y”, expressed in this particular form:
P[X,Y,Z]=P[X|Z]P[Y,Z]=P[X|Y]P[Y,Z]
… then we immediately see that P[X|Z]=P[X|Y] for all X,Y,Z such that P[Y,Z]>0.
So if there are no zero probabilities, then P[X|Z]=P[X|Y] for all X,Y,Z.
That, in turn, implies that P[X|Z] takes on the same value for all Z, which in turn means that it’s equal to P[X]. Thus X and Z are independent. Likewise for X and Y. Finally, we leverage independence of Y and Z given X:
P[X,Y,Z]=P[Y|X]P[Z|X]P[X]
=P[Y]P[Z]P[X]
(A similar argument is in the middle of this post, along with a helpful-to-me visual.)
Right, the step I missed on was that P(X|Y) = P(X|Z) for all y, z implies P(X|Z) = P(X). Thanks!