Charlie Steiner comments on Open Source Replication & Commentary on Anthropic’s Dictionary Learning Paper

Charlie Steiner 24 Oct 2023 6:43 UTC
LW: 2 AF: 1
1
AF
Huh, what is up with the ultra low frequency cluster? If the things are actually firing on the same inputs, then you should really only need one output vector. And if they’re serving some useful purpose, then why is there only one and not more?
- Neel Nanda 24 Oct 2023 10:00 UTC
  LW: 4 AF: 3
  0
  AF Parent
  Idk man, I am quite confused. It’s possible they’re firing on different inputs—even with the same encoder vector, if you have a different bias then you’ll fire somewhat differently (lower bias fires on a superset of what higher bias fires on). And cosine sim 0.975 is not the same as 1, so maybe the error term matters...? But idk, my guess is it’s a weird artifact of the autoencoder training process, that’s finding some weird property of transformers. Being shared across random seeds is by far the weirdest result, which suggests it can’t just be an artifact