Edit: As per @Logan Riggs’s comment, I seem to have misunderstood what was being meant by ‘loss recovered’, so this comment is not relevant.
Cool post! However, it feels a little early to conclude that
> Conceptually, loss recovered seems a worse metric than KL divergence.
In toy settings (i.e. trying to apply SAEs to a standard sparse coding setting where we know the ground truth factors, like in https://www.lesswrong.com/posts/z6QQJbtpkEAX3Aojj/interim-research-report-taking-features-out-of-superposition/ ), SAEs do not acheive zero reconstruction loss even when they recover the ground truth overbasis with high mean max cosine similarity (and the situation is even worse when noise is present). It’s never seemed that obvious to me that we should be aiming to have SAE reconstruction loss go to zero as we train better SAEs, as we could plausibly still use the basis that the SAEs extract, without having to plug the SAE into a ‘production’ system for mech interp (in which case, we would want good reconstructions).
Edit: As per @Logan Riggs’s comment, I seem to have misunderstood what was being meant by ‘loss recovered’, so this comment is not relevant.
Cool post! However, it feels a little early to conclude that
> Conceptually, loss recovered seems a worse metric than KL divergence.
In toy settings (i.e. trying to apply SAEs to a standard sparse coding setting where we know the ground truth factors, like in https://www.lesswrong.com/posts/z6QQJbtpkEAX3Aojj/interim-research-report-taking-features-out-of-superposition/ ), SAEs do not acheive zero reconstruction loss even when they recover the ground truth overbasis with high mean max cosine similarity (and the situation is even worse when noise is present). It’s never seemed that obvious to me that we should be aiming to have SAE reconstruction loss go to zero as we train better SAEs, as we could plausibly still use the basis that the SAEs extract, without having to plug the SAE into a ‘production’ system for mech interp (in which case, we would want good reconstructions).
Seems tangential. I interpreted loss recovered is CE-related (not reconstruction related).