Logan Riggs comments on JumpReLU SAEs + Early Access to Gemma 2 SAEs

Logan Riggs 23 Jul 2024 18:00 UTC
LW: 2 AF: 1
0
AF
Did y’all do any ablations on your loss terms. For example:
1. JumpReLU() → ReLU
2. L0 (w/ STE) → L1

I’d be curious to see if the pareto improvements and high frequency features are due to one, the other, or both
- Senthooran Rajamanoharan 23 Jul 2024 19:15 UTC
  3 points
  0
  Parent
  You can’t do JumpReLU → ReLU in the current setup, as there would be no threshold parameters to train with the STE (i.e. it would basically train a ReLU SAE with no sparsity penalty). In principle you should be able to train the other SAE parameters by adding more pseudo derivatives, but we found this didn’t work as well as just training the threshold (so there was then no point in trying this ablation). L0 → L1 leads to worse Pareto curves (I can’t remember off the top of my head which side of Gated, but definitely not significantly better than Gated) - it’s a good question whether this resolves the high frequency features; my guess is it would (as I think Gated SAEs basically approximate JumpReLU SAEs with a L1 loss) but we didn’t check this.