Logan Riggs comments on Sparse autoencoders find composed features in small toy models

Logan Riggs 20 Mar 2024 22:09 UTC
1 point
0
Hey! Thanks for doing this research.
Lee Sharkey et al did a similar experiment a while back w/ much larger number of features & dimensions, & there were hyperaparameters that perfectly reconstructed the original dataset (this was as you predicted as N increases).
Hoagy still hosts a version of our replication here (though I haven’t looked at that code in a year!).
- Evan Anders 21 Mar 2024 18:00 UTC
  3 points
  2
  Parent
  Hi Logan! Thanks for pointing me towards that post—I’ve been meaning to get around to reading it in detail and just finally did. Glad to see that the large-N limit seems to get perfect reconstruction for at least one similar toy experiment! And thanks for sharing the replication code.
  I’m particularly keen to learn a bit more about the correlated features—did you (or do you know of anyone) who has studied toy models where they have a few features that are REALLY correlated with one another, and that basically never appear with other features? I’m wondering if such features could bring back the problem that we saw here, even in a very high-dimensional model / dataset. Most of the metrics in that post are averaged over all features, so don’t really differentiate between correlated or not, etc.
  - Logan Riggs 22 Mar 2024 4:17 UTC
    3 points
    1
    Parent
    Agreed. You would need to change the correlation code to hardcode feature correlations, then you can zoom in on those two features when doing the max cosine sim.