Is there a reason you did 300⁄400 randomly sampled indices, instead of evenly spaced indices (e.g. every 1⁄300 of the total training steps)?
No!
Did you subtract the mean of the weights before doing the SVD? Otherwise, the first component is probably the mean of the 300⁄400 weight vectors.
Ah, this is a good idea! I’ll make sure to incorporate it, thanks!
Unfortunately, interpreting the other big ones turned out to be pretty non-trivial. This is despite the fact that many parts of the final network have low-rank approximations that capture >99% of the variance, we know the network is getting more sparse in the Fourier basis, and the entire function of the network is well known enough that you can literally read off the trig identities being used at the MLP layer. So I’m not super confident that purely unsupervised linear methods actually will help much with interpretability here.
Interesting. I’ll be sure to read what he’s written to see if its what I’d do.
No!
Ah, this is a good idea! I’ll make sure to incorporate it, thanks!
Interesting. I’ll be sure to read what he’s written to see if its what I’d do.
Thanks for the pointer, and thanks for the overall very helpful comment!