Rohin Shah comments on Improving Dictionary Learning with Gated Sparse Autoencoders

Rohin Shah 26 Apr 2024 6:50 UTC
LW: 2 AF: 2
0
AF
This suggestion seems less expressive than (but similar in spirit to) the “rescale & shift” baseline we compare to in Figure 9. The rescale & shift baseline is sufficient to resolve shrinkage, but it doesn’t capture all the benefits of Gated SAEs.
The core point is that L1 regularization adds lots of biases, of which shrinkage is just one example, so you want to localize the effect of L1 as much as possible. In our setup L1 applies to $ReLU (π_{gate} (x))$ , so you might think of $π_{gate}$ as “tainted”, and want to use it as little as possible. The only thing you really need L1 for is to deter the model from setting too many features active, i.e. you need it to apply to one bit per feature (whether that feature is on / off). The Heaviside step function makes sure we are extracting just that one bit, and relying on $f_{mag}$ for everything else.