Guillaume Corlouer comments on The purposeful drunkard

Guillaume Corlouer 13 Jan 2025 22:13 UTC
3 points
0
Interesting! Perhaps one way to not be fooled by such situations could be to use a non-parametric statistical test. For example, we could apply permutation testing: by resampling the data to break its correlation structure and performing PCA on each permuted dataset, we can form a null distribution of eigenvalues. Then, by comparing the eigenvalues from the original data to this null distribution, we could assess whether the observed structure is unlikely under randomness. Specifically, we’d look at how extreme each original eigenvalue is relative to those from the permuted data to decide if we can reject the null hypothesis that the data is random. Could be computationally prohibitive though, but maybe there are some ways to go around that (reducing the number of permutations, downsampling the data....). Also note that failing to reject the null does not mean that the data is random. Somehow it does not look like these kind of tests are common practice in interpretability?