Garrett Baker comments on Performing an SVD on a time-series matrix of gradient updates on an MNIST network produces 92.5 singular values

Garrett Baker 21 Dec 2022 1:50 UTC
1 point
0
Oh yeah, I just remembered I had a way to figure out whether we’re actually getting a good approximation from our cutoff: look at what happens if you use the induced low rank approximation gradient update matrix as your gradients, then look at the loss of your alt model.