Thanks for your amazing work! Theoretically I think that layers with higher input norms should have lower SAE L2 ratios, as they corresponds to higher feature activations that are penalized heavier. I wonder if your data confirms this hypothesis.
Thanks for your amazing work! Theoretically I think that layers with higher input norms should have lower SAE L2 ratios, as they corresponds to higher feature activations that are penalized heavier. I wonder if your data confirms this hypothesis.