Sodium comments on Lucius Bushnaq’s Shortform

Sodium 6 Sep 2024 6:32 UTC
2 points
2
Have people done evals for a model with/without an SAE inserted? Seems like even just looking at drops in MMLU performance by category could be non-trivially informative.
- Lucius Bushnaq 6 Sep 2024 7:19 UTC
  2 points
  2
  Parent
  I’ve seen a little bit of this, but nowhere near as much as I think the topic merits. I agree that systematic studies on where and how the reconstruction errors make their effects known might be quite informative.
  Basically, whenever people train SAEs, or use some other approximate model decomposition that degrades performance, I think they should ideally spend some time after just playing with the degraded model and talking to it. Figure out in what ways it is worse.
  - Sodium 6 Sep 2024 16:26 UTC
    1 point
    0
    Parent
    Hmmm ok maybe I’ll take a look at this :)