Garrett Baker comments on Performing an SVD on a time-series matrix of gradient updates on an MNIST network produces 92.5 singular values

Garrett Baker 21 Dec 2022 1:41 UTC
1 point
0
Is there a reason you did ³⁰⁰⁄₄₀₀ randomly sampled indices, instead of evenly spaced indices (e.g. every ¹⁄₃₀₀ of the total training steps)?
No!
Did you subtract the mean of the weights before doing the SVD? Otherwise, the first component is probably the mean of the ³⁰⁰⁄₄₀₀ weight vectors.
Ah, this is a good idea! I’ll make sure to incorporate it, thanks!
Unfortunately, interpreting the other big ones turned out to be pretty non-trivial. This is despite the fact that many parts of the final network have low-rank approximations that capture >99% of the variance, we know the network is getting more sparse in the Fourier basis, and the entire function of the network is well known enough that you can literally read off the trig identities being used at the MLP layer. So I’m not super confident that purely unsupervised linear methods actually will help much with interpretability here.
Interesting. I’ll be sure to read what he’s written to see if its what I’d do.
Probably the easiest environment to run this on are the examples from Lauro Langosco’s Goal Misgeneralization paper.
Thanks for the pointer, and thanks for the overall very helpful comment!