So ‘latents’ are defined by their conditional distribution functions whose shape is implicit in the factorization that the latents need to satisfy, meaning they don’t have to always look like P[Λ|X], they can look like P[Λ],P[X|Λ], etc, right?
The key idea here is that, when “choosing a latent”, we’re not allowed to choose P[X]; P[X] is fixed/known/given, a latent is just a helpful tool for reasoning about or representing P[X]. So another way to phrase it is: we’re choosing our whole model P[X,Λ], but with a constraint on the marginal P[X]. P[Λ|X] then captures all of the degrees of freedom we have in choosing a latent.
Now, we won’t typically represent the latent explicitly as P[Λ|X]. Typically we’ll choose latents such that P[X,Λ] satisfies some factorization(s), and those factorizations will provide a more compact representation of the distribution than two giant tables for P[X], P[Λ|X]. For instance, insofar as P[Λ,X] factors as P[Λ]∏iP[Xi|Λ], we might want to represent the distribution as P[Λ] and {P[Xi|Λ]} (both for analytic and computational purposes).
I don’t get the ‘standard form’ business.
We’ve largely moved away from using the standard form anyway, I recommend ignoring it at this point.
Also is this post relevant to either of these statements, and if so, does that mean they only hold under strong redundancy?
Yup, that post proves the universal natural latent conjecture when strong redundancy holds (over 3 or more variables). Whether the conjecture does not hold when strong redundancy fails is an open question. But since the strong redundancy result we’ve mostly shifted toward viewing strong redundancy as the usual condition to look for, and focused less on weak redundancy.
Resampling
Also does all this imply that we’re starting out assuming that Λ shares a probability space with all the other possible latents, e.g. P[X,Λ,Λ′,Λ′′,…]? How does this square with a latent variable being defined by the CPD implicit in the factorization?
We conceptually start with the objects P[X], P[Λ|X], and P[Λ′|X]. (We’re imagining here that two different agents measure the same distribution P[X], but then they each model it using their own latents.) Given only those objects, the joint distribution P[X,Λ,Λ′] is underdefined—indeed, it’s unclear what such a joint distribution would even mean! Whose distribution is it?
One simple answer (unsure whether this will end up being the best way to think about it): one agent is trying to reason about the observables X, their own latent Λ, and the other agent’s latent Λ′ simultaneously, e.g. in order to predict whether the other agent’s latent is isomorphic to Λ (as would be useful for communication).
Since Λ and Λ′ are both latents, in order to define P[X,Λ,Λ′], the agent needs to specify P[Λ,Λ′|X]. That’s where the underdefinition comes in: only P[Λ|X] and P[Λ′|X] were specified up-front. So, we sidestep the problem: we construct a new latent Λ′′ such that P[Λ′′|X] matches P[Λ|X], but Λ′′ is independent of Λ′ given X. Then we’ve specified the whole distribution P[X,Λ′,Λ′′]=P[X]P[Λ′|X]P[Λ′′|X].
Hopefully that clarifies what the math is, at least. It’s still a bit fishy conceptually, and I’m not convinced it’s the best way to handle the part it’s trying to handle.
The key idea here is that, when “choosing a latent”, we’re not allowed to choose P[X]; P[X] is fixed/known/given, a latent is just a helpful tool for reasoning about or representing P[X]. So another way to phrase it is: we’re choosing our whole model P[X,Λ], but with a constraint on the marginal P[X]. P[Λ|X] then captures all of the degrees of freedom we have in choosing a latent.
Now, we won’t typically represent the latent explicitly as P[Λ|X]. Typically we’ll choose latents such that P[X,Λ] satisfies some factorization(s), and those factorizations will provide a more compact representation of the distribution than two giant tables for P[X], P[Λ|X]. For instance, insofar as P[Λ,X] factors as P[Λ]∏iP[Xi|Λ], we might want to represent the distribution as P[Λ] and {P[Xi|Λ]} (both for analytic and computational purposes).
We’ve largely moved away from using the standard form anyway, I recommend ignoring it at this point.
Yup, that post proves the universal natural latent conjecture when strong redundancy holds (over 3 or more variables). Whether the conjecture does not hold when strong redundancy fails is an open question. But since the strong redundancy result we’ve mostly shifted toward viewing strong redundancy as the usual condition to look for, and focused less on weak redundancy.
Resampling
We conceptually start with the objects P[X], P[Λ|X], and P[Λ′|X]. (We’re imagining here that two different agents measure the same distribution P[X], but then they each model it using their own latents.) Given only those objects, the joint distribution P[X,Λ,Λ′] is underdefined—indeed, it’s unclear what such a joint distribution would even mean! Whose distribution is it?
One simple answer (unsure whether this will end up being the best way to think about it): one agent is trying to reason about the observables X, their own latent Λ, and the other agent’s latent Λ′ simultaneously, e.g. in order to predict whether the other agent’s latent is isomorphic to Λ (as would be useful for communication).
Since Λ and Λ′ are both latents, in order to define P[X,Λ,Λ′], the agent needs to specify P[Λ,Λ′|X]. That’s where the underdefinition comes in: only P[Λ|X] and P[Λ′|X] were specified up-front. So, we sidestep the problem: we construct a new latent Λ′′ such that P[Λ′′|X] matches P[Λ|X], but Λ′′ is independent of Λ′ given X. Then we’ve specified the whole distribution P[X,Λ′,Λ′′]=P[X]P[Λ′|X]P[Λ′′|X].
Hopefully that clarifies what the math is, at least. It’s still a bit fishy conceptually, and I’m not convinced it’s the best way to handle the part it’s trying to handle.
Thank you, that is very clarifying!