johnswentworth comments on Searching for Modularity in Large Language Models

johnswentworth 9 Sep 2022 0:49 UTC
3 points
0
Nice work!
Some comments on interpretation of some of the graphs:
- SVD graphs: most singular value implementations are only precise to singular values of ~1e-8 times the maximum singular value. So in those graphs where the singular values fall off sharply to about 1e-7 (with maximum singular value about 1e1 or 1e2), and then flatten out, what’s actually going on is almost certainly that the later singular values are basically zero and are dominated by noise from numerical imprescision.
  - Also in those graphs: it looks like there is a “tight pass” from the first ~tens of singular values, which are an order of magnitude higher than the relatively-flat slope afterwards.
- Cosine similarity graphs: boy, those sure do look diagonal plus low rank. I wonder what the dimension of the low rank components is, and if they correspond to anything meaningful.