Senthooran Rajamanoharan comments on Improving Dictionary Learning with Gated Sparse Autoencoders

Senthooran Rajamanoharan 25 Apr 2024 22:57 UTC
LW: 1 AF: 1
0
AF
Hey Sam, thanks—you’re right. The definition of reconstruction bias is actually the argmin of
$E [|^x / γ^{'} - x |^{2}]$
which I’d (incorrectly) rearranged as the expression in the paper. As a result, the optimum is
$γ^{- 1} = E [^x \cdot x] / E [|^x |^{2}]$
That being said, the derivation we gave was not quite right, as I’d incorrectly substituted the optimised loss rather than the original reconstruction loss, which makes equation (10) incorrect. However the difference between the two is small exactly when gamma is close to one (and indeed vanishes when there is no shrinkage), which is probably why we didn’t pick this up. Anyway, we plan to correct these two equations and update the graphs, and will submit a revised version.
- Senthooran Rajamanoharan 26 Apr 2024 12:38 UTC
  LW: 11 AF: 8
  0
  AF Parent
  UPDATE: we’ve corrected equations 9 and 10 in the paper (screenshot of the draft below) and also added a footnote that hopefully helps clarify the derivation. I’ve also attached a revised figure 6, showing that this doesn’t change the overall story (for the mathematical reasons I mentioned in my previous comment). These will go up on arXiv, along with some other minor changes (like remembering to mention SAEs’ widths), likely some point next week. Thanks again Sam for pointing this out!
  Updated equations (draft):
  Updated figure 6 (shrinkage comparison for GELU-1L):