1stuserhere comments on Daniel Tan’s Shortform

1stuserhere 17 Jul 2024 16:18 UTC
3 points
0

If we train several SAEs from scratch on the same set of model activations, are they “equivalent”?

For SAEs of different sizes, for most layers, the smaller SAE does contain very high similarity with some of the larger SAE features, but it’s not always true. I’m working on an upcoming post on this.
- Bart Bussmann 18 Jul 2024 4:48 UTC
  2 points
  0
  Parent
  Interesting, we find that all features in a smaller SAE have a feature in a larger SAE with cosine similarity > 0.7, but not all features in a larger SAE have a close relative in a smaller SAE (but about ~65% do have a close equavalent at 2x scale up).