Alexander Gietelink Oldenziel comments on Showing SAE Latents Are Not Atomic Using Meta-SAEs

Alexander Gietelink Oldenziel 24 Aug 2024 13:38 UTC
2 points
0
I’m curious if these observations are related at all to the work by Mendel, Hanni and Vaintrob on SAE features, more discussion here.
- Neel Nanda 24 Aug 2024 15:50 UTC
  2 points
  0
  Parent
  Is the first post the one you meant to link, or did you mean the followup post from Jake? The first post is on toy models of AND and XORs, which I don’t see as being super relevant. But I think Jake’s argument that there’s clear structure that naive hypotheses neglect seems clearly legit