Dan Braun comments on Identifying Functionally Important Features with End-to-End Sparse Dictionary Learning

Dan Braun 22 Aug 2024 19:10 UTC
7 points
0
Here’s a wandb report that includes plots for the KL divergence. e2e+downstream indeed performs better for layer 2. So it’s possible that intermediate losses might help training a little. But I wouldn’t be surprised if better hyperparams eliminated this difference; we put more effort into optimising the SAE_local hyperparams rather than the SAE_e2e and SAE_e2e+ds hyperparams.