nostalgebraist comments on interpreting GPT: the logit lens

nostalgebraist 2 Sep 2020 0:02 UTC
LW: 5 AF: 3
AF
I also thought of PCA/SVD, but I imagine matrix decompositions like these would be misleading here.
What matters here (I think) is not some basis of N_emb orthogonal vectors in embedding space, but some much larger set of ~exp(N_emb) almost orthogonal vectors. We only have 1600 degrees of freedom to tune, but they’re continuous degrees of freedom, and this lets us express >>1600 distinct vectors in vocab space as long as we accept some small amount of reconstruction error.
I expect GPT and many other neural models are effectively working in such space of nearly orthogonal vectors, and picking/combining elements of it. A decomposition into orthogonal vectors won’t really illuminate this. I wish I knew more about this topic—are there standard techniques?
- Vlad Mikulik 2 Sep 2020 15:42 UTC
  LW: 5 AF: 3
  AF Parent
  You might want to look into NMF, which, unlike PCA/SVD, doesn’t aim to create an orthogonal projection. It works well for interpretability because its components cannot cancel each other out, which makes its features more intuitive to reason about. I think it is essentially what you want, although I don’t think it will allow you to find directly the ‘larger set of almost orthogonal vectors’ you’re looking for.