Neel Nanda comments on The Singular Value Decompositions of Transformer Weight Matrices are Highly Interpretable

Neel Nanda 29 Nov 2022 23:39 UTC
LW: 8 AF: 5
0
AF
Thanks for sharing this! I’m excited to see more interpretability posts. (Though this felt far too high production value—more posts, shorter posts and lower effort per post plz)

If we plot the distribution of the singular vectors, we can see that the rank only slowly decreases until 64 then rapidly decreases. This is because, fundamentally, the OV matrix is only of rank 64. The singular value distribution of the meaningful ranks, however, declines slowly in log-space, giving at least some evidence towards the idea that the network is utilizing most of the ‘space’ available in this OV circuit head.

Quick feedback that the graph after this paragraph feels sketchy to me—obviously the singular values are zero beyond 64, and they’re so far low down that all singular values above look identical. But the y axis is screwed up, so you can’t really see this. What does the graph look like if you fix it? To me, it looks like there’s actually some sparsity and the early singular values are far larger (looks like there’s a big kink at the start, though it looks tiny because we’re so zoomed out).

I also personally think that a linear scale is often more principled for a spectrum graph, but not confident in that take.
- beren 2 Dec 2022 16:12 UTC
  1 point
  2
  Parent
  Quick feedback that the graph after this paragraph feels sketchy to me—obviously the singular values are zero beyond 64, and they’re so far low down that all singular values above look identical. But the y axis is screwed up, so you can’t really see this. What does the graph look like if you fix it?
  Indeed, in retrospect presenting the graph this way seems to have confused a lot of people and I have now updated it to already be cut off at 64 and just show the spectrum until then, where we see a clear exponential decay in singular values (but still remaining not too small) all the way down to 64, and a slightly greater than exponential initial decay. If you want all the code is in the colab so you can set it to linear scale as well if you want. Personally I think that log-scaling tends to make more sense for spectrum graphs as they are usually exponentials or power-laws.
  Thanks for sharing this! I’m excited to see more interpretability posts. (Though this felt far too high production value—more posts, shorter posts and lower effort per post plz)
  Indeed, we will be aiming for more rapid shorter posts in the near future. Stay tuned.