leogao comments on Lucius Bushnaq’s Shortform

leogao 6 Sep 2024 4:19 UTC
10 points
6
Basically agree—I’m generally a strong supporter of looking at the loss drop in terms of effective compute. Loss recovered using a zero-ablation baseline is really quite wonky and gives misleadingly big numbers.

I also agree that reconstruction is not the only axis of SAE quality we care about. I propose explainability as the other axis—whether we can make necessary and sufficient explanations for when individual latents activate. Progress then looks like pushing this Pareto frontier.