Is my understanding correct or am I missing something?
A latent variable is a variable you can sample that gives you some subset of the mutual information between all the different X’s + possibly some independent extra “noise” unrelated to the X’s
A natural latent is a variable that you can sample that at the limit of sampling will give you all the mutual information between the X’s—nothing more or less
E.g. in the biased die example, every each die roll sample has, in expectation, the same information content, which is the die bias + random noise, and so the mutual info of n rolls is the die bias itself
(where “different X’s” above can be thought of as different (or the same) distributions over the observation space of X corresponding to the different sampling instances, perhaps a non-standard framing)
First bullet is correct, second bullet is close but not quite right. Just one sample of a natural latent will give you (approximately) all the mutual information between the X’s, and can give you some additional “noise” as well.
E.g. in the biased die example with many rolls, we can sample the bias given the rolls. Because that distribution is very pointy the sample will be very close to the “true bias”, that one sample will capture approximately-all of the mutual information between the rolls.
(Note: I did skip a subtle step there—our natural latents need a stronger condition than just “close to the true bias” in this example, since the low-order bits of the latent could in-principle contain a bunch of relevant information which the true bias doesn’t; that would mess everything up. And indeed, that would mess everything up if we tried to use e.g. the empirical frequencies rather than a sample from P[bias | X]: given all but one die roll and the empirical frequencies calculated from all of the die rolls, we could exactly calculate the outcome of the remaining die roll. That’s why we do the sampling thing; the little bit of noise introduced by sampling is load-bearing, since it wipes out info in those low-order bits.
… but that’s a subtlety which you should not worry about until after the main picture makes sense conceptually.)
Is my understanding correct or am I missing something?
A latent variable is a variable you can sample that gives you some subset of the mutual information between all the different X’s + possibly some independent extra “noise” unrelated to the X’s
A natural latent is a variable that you can sample that at the limit of sampling will give you all the mutual information between the X’s—nothing more or less
E.g. in the biased die example, every each die roll sample has, in expectation, the same information content, which is the die bias + random noise, and so the mutual info of n rolls is the die bias itself
(where “different X’s” above can be thought of as different (or the same) distributions over the observation space of X corresponding to the different sampling instances, perhaps a non-standard framing)
First bullet is correct, second bullet is close but not quite right. Just one sample of a natural latent will give you (approximately) all the mutual information between the X’s, and can give you some additional “noise” as well.
E.g. in the biased die example with many rolls, we can sample the bias given the rolls. Because that distribution is very pointy the sample will be very close to the “true bias”, that one sample will capture approximately-all of the mutual information between the rolls.
(Note: I did skip a subtle step there—our natural latents need a stronger condition than just “close to the true bias” in this example, since the low-order bits of the latent could in-principle contain a bunch of relevant information which the true bias doesn’t; that would mess everything up. And indeed, that would mess everything up if we tried to use e.g. the empirical frequencies rather than a sample from P[bias | X]: given all but one die roll and the empirical frequencies calculated from all of the die rolls, we could exactly calculate the outcome of the remaining die roll. That’s why we do the sampling thing; the little bit of noise introduced by sampling is load-bearing, since it wipes out info in those low-order bits.
… but that’s a subtlety which you should not worry about until after the main picture makes sense conceptually.)