beren comments on The Singular Value Decompositions of Transformer Weight Matrices are Highly Interpretable

beren 2 Dec 2022 17:19 UTC
2 points
0
Yes, this is correct. SVD necessarily won’t recover the full JL packing. Given that we don’t know the extent to which the network uses the full JL capacity, then SVD might still get a reasonable fraction of the relevant directions. Also, if the network packs semantically similar vectors close to one another, then the SVD direction might also represent some kind of useful average of them.
Indeed, we are looking at sparse coding to try to construct an over complete basis, as a parallel project. Stay tuned for this.