Dan Braun comments on Improving Dictionary Learning with Gated Sparse Autoencoders

Dan Braun 26 Apr 2024 8:44 UTC
6 points
2
This is neat, nice work!
I’m finding it quite hard to get a sense at what the actual Loss Recovered numbers you report are, and to compare them concretely to other work. If possible, it’d be very helpful if you shared:
1. What the zero ablations CE scores are for each model and SAE position. (I assume it’s much worse for the MLP and attention outputs than the residual stream?)
2. What the baseline CE scores are for each model.
- Arthur Conmy 29 Apr 2024 18:17 UTC
  3 points
  0
  Parent
  Thanks for the feedback, we will put up an update to the paper with all these numbers in tables, tomorrow night. For now I have sent you them (and can send anyone else them who wants them in the next 24H)