Nah I think it’s pretty sketchy. I personally prefer mean ablation, especially for residual stream SAEs where zero ablation is super damaging. But even there I agree. Compute efficiency hit would be nice, though it’s a pain to get the scaling laws precise enough
For our paper this is irrelevant though IMO because we’re comparing gated and normal SAEs, and I think this is just scaling by a constant? It’s at least monotonic in CE loss degradation
Nah I think it’s pretty sketchy. I personally prefer mean ablation, especially for residual stream SAEs where zero ablation is super damaging. But even there I agree. Compute efficiency hit would be nice, though it’s a pain to get the scaling laws precise enough
For our paper this is irrelevant though IMO because we’re comparing gated and normal SAEs, and I think this is just scaling by a constant? It’s at least monotonic in CE loss degradation