gabrielrecc comments on The Singular Value Decompositions of Transformer Weight Matrices are Highly Interpretable

gabrielrecc 28 Nov 2022 21:56 UTC
17 points
2
This reminded me of some findings associated with “latent semantic analysis”, an old-school information retrieval technique. You build a big matrix where each unique term in a corpus (excluding a stoplist of extremely frequent terms) is assigned to a row, each document is assigned to a column, and each cell holds the number of times that term $t_{i}$ appeared in document $d_{j}$ , and with some kind of weighting scheme that downweights frequent terms), and you take the SVD. This also gives you interpretable dimensions, at least if you use varimax rotation. See for example pgs. 9-11 & pgs. 18-20 of this paper. Also, I seem to recall that the positive and negative singular values after doing latent semantic analysis are often both semantically interpretable, sometimes with antipodal pairs, although I can’t find the paper where I saw this.

I’m not sure whether the right way to think about this is “you should be very circumspect about saying that ‘semantic processing’ is going on just because the SVD has interpretable dimensions, because you get that merely by taking the SVD of a slightly preprocessed word-by-document matrix”, or rather “a lot of what we call ‘semantic processing’ in humans is probably just down to pretty simple statistical associations, which the later layers seem to be picking up on”, but it seemed worth mentioning in any case!
edit: seems likely that the “association clusters” seen in the earlier layers might map onto what latent semantic analysis is picking up on, whereas the later layers might be picking up on semantic relationships that aren’t as directly reflected in the surface-level statistical associations. could be tested!