Fengyuan Hu comments on Addressing Feature Suppression in SAEs

Fengyuan Hu 7 Apr 2024 2:20 UTC
1 point
0
Thanks for your amazing work! Theoretically I think that layers with higher input norms should have lower SAE L2 ratios, as they corresponds to higher feature activations that are penalized heavier. I wonder if your data confirms this hypothesis.