Linda Linsefors comments on SAE feature geometry is outside the superposition hypothesis

Linda Linsefors 25 Jun 2024 20:55 UTC
2 points
0
This post reminds me of the Word2vec algebra.
E.g. “kitten”—“cat” + “dog” $\approx$ “puppy”
I expect that this will be true for LLM token embeddings too. Have anyone checked this?
I also expect something similar to be true for internal LLM representations too, but that this might be harder to verify. However, maybe not, if you have interpretable SAE vectors?
- p.b. 27 Jun 2024 7:06 UTC
  2 points
  0
  Parent
  Case in point: This is a five year old tsne plot of word vectors on my laptop.