Bart Bussmann comments on Do sparse autoencoders find “true features”?

Bart Bussmann 7 Mar 2024 14:08 UTC
2 points
0
I expect the 0.05 peak might be the minimum cosine similarity if you want to distribute 8192 vectors over a 512-dimensional space uniformly? I used a bit of a weird regularizer where I penalized:

mean cosine similarity + mean max cosine similarity + max max cosine similarity

I will check later whether the 0.3 peak all have the same neighbour.
- Demian Till 10 Mar 2024 14:10 UTC
  1 point
  0
  Parent
  Nice, that’s promising! It would also be interesting to see how those peaks are affected when you retrain the SAE both on the same target model and on different target models.