Basically agree—I’m generally a strong supporter of looking at the loss drop in terms of effective compute. Loss recovered using a zero-ablation baseline is really quite wonky and gives misleadingly big numbers.
I also agree that reconstruction is not the only axis of SAE quality we care about. I propose explainability as the other axis—whether we can make necessary and sufficient explanations for when individual latents activate. Progress then looks like pushing this Pareto frontier.
Basically agree—I’m generally a strong supporter of looking at the loss drop in terms of effective compute. Loss recovered using a zero-ablation baseline is really quite wonky and gives misleadingly big numbers.
I also agree that reconstruction is not the only axis of SAE quality we care about. I propose explainability as the other axis—whether we can make necessary and sufficient explanations for when individual latents activate. Progress then looks like pushing this Pareto frontier.