Probabilities of zero are extremely load-bearing for natural latents in the exact case...
Dumb question: Can you sketch out an argument for why this is the case and/or why this has to be the case? I agree that ideally/morally this should be true, but if we’re already accepting a bounded degree of error elsewhere, what explodes if we accept it here?
Consider the exact version of the redundancy condition for latent Λ over X1,X2:
P[Λ,X1,X2]=P[Λ|X1]P[X1,X2]
and
P[Λ,X1,X2]=P[Λ|X2]P[X1,X2]
Combine these two and we get, for all Λ,X1,X2:
P[Λ|X1]=P[Λ|X2] OR P[X1,X2]=0
That’s the foundation for a conceptually-simple method for finding the exact natural latent (if one exists) given a distribution P[X1,X2]:
Pick a value X1,X2 which has nonzero probability, and initialize a set S containing that value. Then we must have P[Λ|X∈S]=P[Λ|X1]=P[Λ|X2] for all Λ.
Loop: add to S a new value X′1,X2 or X1,X′2 where the value X2 or X1 (respectively) already appears in one of the pairs in S. Then P[Λ|X′1]=P[Λ|X∈S] or P[Λ|X′2]=P[Λ|X∈S], respectively. Repeat until there are no more candidate values to add to S.
Pick a new pair and repeat with a new set, until all values of X have been added to a set.
Now take the latent to be the equivalence class in which X falls.
Dumb question: Can you sketch out an argument for why this is the case and/or why this has to be the case? I agree that ideally/morally this should be true, but if we’re already accepting a bounded degree of error elsewhere, what explodes if we accept it here?
Consider the exact version of the redundancy condition for latent Λ over X1,X2:
P[Λ,X1,X2]=P[Λ|X1]P[X1,X2]
and
P[Λ,X1,X2]=P[Λ|X2]P[X1,X2]
Combine these two and we get, for all Λ,X1,X2:
P[Λ|X1]=P[Λ|X2] OR P[X1,X2]=0
That’s the foundation for a conceptually-simple method for finding the exact natural latent (if one exists) given a distribution P[X1,X2]:
Pick a value X1,X2 which has nonzero probability, and initialize a set S containing that value. Then we must have P[Λ|X∈S]=P[Λ|X1]=P[Λ|X2] for all Λ.
Loop: add to S a new value X′1,X2 or X1,X′2 where the value X2 or X1 (respectively) already appears in one of the pairs in S. Then P[Λ|X′1]=P[Λ|X∈S] or P[Λ|X′2]=P[Λ|X∈S], respectively. Repeat until there are no more candidate values to add to S.
Pick a new pair and repeat with a new set, until all values of X have been added to a set.
Now take the latent to be the equivalence class in which X falls.
Does that make sense?
As I also said in person, very much so!