Cool work—figuring out how much of scaling up autoencoders is discovering new features vs splitting existing ones feels quite important. Especially since for any one scale of autoencoder there are simultaneously features which are split too finely and features which are too rare to yet be discovered, it seems quite plausible that the most useful autoencoders will be ones with features stitched together from multiple scales.
Some minor nitpicks: I would recommend always thinking of MSE/L0 in terms of the frontier between the two, rather than either alone; in my experiments I found it very easy to misjudge at a glance whether a run with better MSE but worse L0 was better or worse than the frontier.
Yeah, I think that’s fair and don’t necessarily think that stitching multiple SAEs is a great way to move the pareto frontier of MSE/L0 (although some tentative experiments showed they might serve as a good initialization if retrained completely).
However, I don’t think that low L0 should be a goal in itself when training SAEs as L0 mainly serves as a proxy for the interpretability of the features, by lack of good other feature quality metrics. As stitching features doesn’t change the interpretability of the features, I’m not sure how useful/important the L0 metric still is in this context.
Cool work—figuring out how much of scaling up autoencoders is discovering new features vs splitting existing ones feels quite important. Especially since for any one scale of autoencoder there are simultaneously features which are split too finely and features which are too rare to yet be discovered, it seems quite plausible that the most useful autoencoders will be ones with features stitched together from multiple scales.
Some minor nitpicks: I would recommend always thinking of MSE/L0 in terms of the frontier between the two, rather than either alone; in my experiments I found it very easy to misjudge at a glance whether a run with better MSE but worse L0 was better or worse than the frontier.
Thanks!
Yeah, I think that’s fair and don’t necessarily think that stitching multiple SAEs is a great way to move the pareto frontier of MSE/L0 (although some tentative experiments showed they might serve as a good initialization if retrained completely).
However, I don’t think that low L0 should be a goal in itself when training SAEs as L0 mainly serves as a proxy for the interpretability of the features, by lack of good other feature quality metrics. As stitching features doesn’t change the interpretability of the features, I’m not sure how useful/important the L0 metric still is in this context.