Thanks for the reply. I certainly agree that “factor analysis” often doesn’t make that assumption, though it was my impression that it’s commonly made in this context. I suppose the degree of misleading-ness here depends on how often people assume isotropic noise when looking at this kind of data?
In any case, I’ll try to think about how to clarify this without getting too technical. (I actually had some more details about this at one point but was persuaded to remove them for the sake of being more accessible.)
I’m not sure how often people assume equal noise in all measurements, but I suspect it’s more often than they should—there must be a temptation to do so in order that simple methods like SVD can be used (just like Bayesian statisticians sometimes use “conjugate” priors because they’re analytically tractable, even if they’re inappropriate for the actual problem).
Note that it’s not really just literal “measurement noise”, but also any other sources of variation that affect only one measured variable.
Thanks, I clarified the noise issue. Regarding factor analysis, could you check if I understand everything correctly? Here’s what I think is the situation:
We can write a factor analysis model (with a single factor) as
x=wg+e
where:
x is observed data
g∼N(0,1) is a random latent variable
w∈Rn is some vector (a parameter)
e∼N(0,Σ) is a random noise variable
Σ is the covariance of the noise (a parameter)
It always holds (assuming g and e are independent) that
Cov[x]=wwT+Σ.
In the simplest variant of factor analysis (in the current post) we use Σ=aI in which case you get that
Cov[x]=wwT+aI.
You can check if this model fits by (1) checking that x is Normal and (2) checking if the covariance of x can be decomposed as in the above equation. (Which is equivalent to having all singular values the same except one).
The next slightly-less-simple variant of factor analysis (which I think you’re suggesting) would be to use Σ=diag(a) where a is a vector, in which case you get that
Cov[x]=wwT+diag(a).
You can again check if this model fits by (1) checking that x is Normal and (2) checking if the covariance of x can be decomposed as in the above equation. (The difference is, now this doesn’t reduce to some simple singular value condition.)
Assuming you’re using “C” to denote Covariance (“Cov” is more common), that seems right.
It’s typical that the noise covariance is diagonal, since a general covariance matrix for the noise would render use of a latent variable unnecessary (the whole covariance matrix for x could be explained by the covariance matrix of the “noise”, which would actually include the signal as well). (Though it could be that some people use a non-diagonal covariance matrix that is subject to some other sort of constraint that makes the procedure meaningful.)
Of course, it is very typical for people to use factor analysis models with more than one latent variable. There’s no a priori reason why “intelligence” couldn’t have a two-dimensional latent variable. In any real problem, we of course don’t expect any model that doesn’t produce a fully general covariance matrix to be exactly correct, but it’s scientifically interesting if a restricted model (eg, just one latent variable) is close to being correct, since that points to possible underlying mechanisms.
Thanks for the reply. I certainly agree that “factor analysis” often doesn’t make that assumption, though it was my impression that it’s commonly made in this context. I suppose the degree of misleading-ness here depends on how often people assume isotropic noise when looking at this kind of data?
In any case, I’ll try to think about how to clarify this without getting too technical. (I actually had some more details about this at one point but was persuaded to remove them for the sake of being more accessible.)
I’m not sure how often people assume equal noise in all measurements, but I suspect it’s more often than they should—there must be a temptation to do so in order that simple methods like SVD can be used (just like Bayesian statisticians sometimes use “conjugate” priors because they’re analytically tractable, even if they’re inappropriate for the actual problem).
Note that it’s not really just literal “measurement noise”, but also any other sources of variation that affect only one measured variable.
Thanks, I clarified the noise issue. Regarding factor analysis, could you check if I understand everything correctly? Here’s what I think is the situation:
We can write a factor analysis model (with a single factor) as
x=wg+e
where:
x is observed data
g∼N(0,1) is a random latent variable
w∈Rn is some vector (a parameter)
e∼N(0,Σ) is a random noise variable
Σ is the covariance of the noise (a parameter)
It always holds (assuming g and e are independent) that
Cov[x]=wwT+Σ.
In the simplest variant of factor analysis (in the current post) we use Σ=aI in which case you get that
Cov[x]=wwT+aI.
You can check if this model fits by (1) checking that x is Normal and (2) checking if the covariance of x can be decomposed as in the above equation. (Which is equivalent to having all singular values the same except one).
The next slightly-less-simple variant of factor analysis (which I think you’re suggesting) would be to use Σ=diag(a) where a is a vector, in which case you get that
Cov[x]=wwT+diag(a).
You can again check if this model fits by (1) checking that x is Normal and (2) checking if the covariance of x can be decomposed as in the above equation. (The difference is, now this doesn’t reduce to some simple singular value condition.)
Do I have all that right?
Assuming you’re using “C” to denote Covariance (“Cov” is more common), that seems right.
It’s typical that the noise covariance is diagonal, since a general covariance matrix for the noise would render use of a latent variable unnecessary (the whole covariance matrix for x could be explained by the covariance matrix of the “noise”, which would actually include the signal as well). (Though it could be that some people use a non-diagonal covariance matrix that is subject to some other sort of constraint that makes the procedure meaningful.)
Of course, it is very typical for people to use factor analysis models with more than one latent variable. There’s no a priori reason why “intelligence” couldn’t have a two-dimensional latent variable. In any real problem, we of course don’t expect any model that doesn’t produce a fully general covariance matrix to be exactly correct, but it’s scientifically interesting if a restricted model (eg, just one latent variable) is close to being correct, since that points to possible underlying mechanisms.