I believe that this sort of rotation (without the PCA) has actually been used in certain causal inference algorithms, but as far as I can tell it basically assumes that causality flows from variables with higher kurtosis to variables with lower kurtosis, which admittedly seems plausible for a lot of cases, but also seems like it consistently gives the wrong results if you’ve got certain nonlinear/thresholding effects (which seem plausible in some of the areas I’ve been looking to apply it).
Where did you get this notion about kurtosis? Factor analysis or PCA only take in a correlation matrix as input, and so only model the second order moments of the joint distribution (i.e. correlations/variances/covariances, but not kurtosis). In fact, it is sometimes assumed in factor analysis that all variables and latent factors are jointly multivariate normal (and so all random variables have excess kurtosis 0).
Bayes net is not the same thing as PCA/factor analysis in part because it is trying to factor the entire joint distribution rather than just the correlation matrix.
This part of the comment wasn’t about PCA/FA, hence “without the PCA”. The formal name for what I had in mind is ICA, which often works by maximizing kurtosis.
What you seemed to be saying is that a certain rotation (“one should rotate them so that the resulting axes have a sparse relationship with the original cases”) has “actually been used” and “it basically assumes that causality flows from variables with higher kurtosis to variables with lower kurtosis”.
I don’t see what the kurtosis-maximizing algorithm has to do with the choice of rotation used in factor analysis or PCA.
Where did you get this notion about kurtosis? Factor analysis or PCA only take in a correlation matrix as input, and so only model the second order moments of the joint distribution (i.e. correlations/variances/covariances, but not kurtosis). In fact, it is sometimes assumed in factor analysis that all variables and latent factors are jointly multivariate normal (and so all random variables have excess kurtosis 0).
Bayes net is not the same thing as PCA/factor analysis in part because it is trying to factor the entire joint distribution rather than just the correlation matrix.
This part of the comment wasn’t about PCA/FA, hence “without the PCA”. The formal name for what I had in mind is ICA, which often works by maximizing kurtosis.
What you seemed to be saying is that a certain rotation (“one should rotate them so that the resulting axes have a sparse relationship with the original cases”) has “actually been used” and “it basically assumes that causality flows from variables with higher kurtosis to variables with lower kurtosis”.
I don’t see what the kurtosis-maximizing algorithm has to do with the choice of rotation used in factor analysis or PCA.