beren comments on The Singular Value Decompositions of Transformer Weight Matrices are Highly Interpretable

beren 2 Dec 2022 17:43 UTC
1 point
0
This seems like an important but I am not sure I completely follow. How do rays differ from directions here? I agree that the SVD directions won’t recover any JL kind of dense packing of directions since it is constrained to, at maximum, the dimension of the matrix. The thinking here is then that if the model tends to pack semantically similar directions into closely related dimensions, then the SVD would pick up on at least an average of this and represent it.
I also think something to keep in mind is that we are doing the SVDs over the OV and MLP weights and not activations. That is, these are the directions in which the weight matrix is most strongly stretching the activation space. We don’t necessarily expect the weight matrix to be doing its own JL packing, I don’t think. I also think that it is reasonable that the SVD would find sensible directions here. It is of course possible that the network isn’t relying on the principal svd directions for it’s true ‘semantic’ processing but that it performs the stretching/compressing with some intermediate direction comprised of multiple SVD directions and we can’t rule that out with this method.