Thanks, you’re totally right about the equal variance thing—I had stupidly thought that the projection of U([0,1]2) onto y = x would be uniform on [−1√2,1√2] (obviously false!).
The case of a fully discrete distribution (supported in this case on four points) seems like a very special case of a something more general, where a “more typical” special case would be something like:
if a, b are both false, then sample from N(0,Σ)
if a is true and b is false, then sample from N(μa,Σ)
if a is false and b is true then sample from N(μb,Σ)
if a and b are true, then sample from N(μa+μb,Σ)
for some μa,μb∈Rn and covariance matrix Σ. In general, I don’t really expect the class-conditional distributions to be Gaussian, nor for the class-conditional covariances to be independent of the class. But I do expect something broadly like this, where the distributions are concentrated around their class-conditional means with probability falling off as you move further from the class-conditional mean (hence unimodality), and that the class-conditional variances are not too big relative to the distance between the clusters.
Given that longer explanation, does the unimodality thing still seem directionally wrong?
Oops, I misunderstood what you meant by unimodality earlier. Your comment seems broadly correct now (except for the variance thing). I would still guess that unimodality isn’t precisely the right well-behavedness desideratum, but I retract the “directionally wrong”.
Thanks, you’re totally right about the equal variance thing—I had stupidly thought that the projection of U([0,1]2) onto y = x would be uniform on [−1√2,1√2] (obviously false!).
The case of a fully discrete distribution (supported in this case on four points) seems like a very special case of a something more general, where a “more typical” special case would be something like:
if a, b are both false, then sample from N(0,Σ)
if a is true and b is false, then sample from N(μa,Σ)
if a is false and b is true then sample from N(μb,Σ)
if a and b are true, then sample from N(μa+μb,Σ)
for some μa,μb∈Rn and covariance matrix Σ. In general, I don’t really expect the class-conditional distributions to be Gaussian, nor for the class-conditional covariances to be independent of the class. But I do expect something broadly like this, where the distributions are concentrated around their class-conditional means with probability falling off as you move further from the class-conditional mean (hence unimodality), and that the class-conditional variances are not too big relative to the distance between the clusters.
Given that longer explanation, does the unimodality thing still seem directionally wrong?
Oops, I misunderstood what you meant by unimodality earlier. Your comment seems broadly correct now (except for the variance thing). I would still guess that unimodality isn’t precisely the right well-behavedness desideratum, but I retract the “directionally wrong”.