Here’s a wandb report that includes plots for the KL divergence. e2e+downstream indeed performs better for layer 2. So it’s possible that intermediate losses might help training a little. But I wouldn’t be surprised if better hyperparams eliminated this difference; we put more effort into optimising the SAE_local hyperparams rather than the SAE_e2e and SAE_e2e+ds hyperparams.
Here’s a wandb report that includes plots for the KL divergence. e2e+downstream indeed performs better for layer 2. So it’s possible that intermediate losses might help training a little. But I wouldn’t be surprised if better hyperparams eliminated this difference; we put more effort into optimising the SAE_local hyperparams rather than the SAE_e2e and SAE_e2e+ds hyperparams.