Rohin Shah comments on Improving Dictionary Learning with Gated Sparse Autoencoders

Rohin Shah 25 Apr 2024 21:52 UTC
LW: 4 AF: 3
2
AF
Possibly I’m missing something, but if you don’t have $L_{aux}$ , then the only gradients to $W_{gate}$ and $b_{gate}$ come from $L_{sparsity}$ (the binarizing Heaviside activation function kills gradients from $L_{reconstruct}$ ), and so $π_{gate}$ would be always non-positive to get perfect zero sparsity loss. (That is, if you only optimize for L1 sparsity, the obvious solution is “none of the features are active”.)

(You could use a smooth activation function as the gate, e.g. an element-wise sigmoid, and then you could just stick with $L_{incorrect}$ from the beginning of Section 3.2.2.)
- Sam Marks 25 Apr 2024 21:57 UTC
  LW: 2 AF: 1
  0
  AF Parent
  Ah thanks, you’re totally right—that mostly resolves my confusion. I’m still a little bit dissatisfied, though, because the $L_{aux}$ term is optimizing for something that we don’t especially want (i.e. for $^x (ReLU (π_{gated} (x))$ to do a good job of reconstructing $x$ ). But I do see how you do need to have some sort of a reconstruction-esque term that actually allows gradients to pass through to the gated network.
  - Senthooran Rajamanoharan 25 Apr 2024 23:08 UTC
    LW: 3 AF: 3
    0
    AF Parent
    Yep, the intuition here indeed was that L1 penalised reconstruction seems to be okay for teaching a standard SAE’s encoder to detect which features are on (even if features get shrunk as a result), so that is effectively what this auxiliary loss is teaching the gate sub-layer to do, alongside the sparsity penalty. (The key difference being we freeze the decoder in the auxiliary task, which the ablation study shows helps performance.) Maybe to put it another way, this was an auxiliary task that we had good evidence would teach the gate sublayer to detect active features reasonably well, and it turned out to give good results in practice. It’s totally possible though that there are better auxiliary tasks (or even completely different loss functions) out there that we’ve not explored.