Senthooran Rajamanoharan comments on Improving Dictionary Learning with Gated Sparse Autoencoders

Senthooran Rajamanoharan 31 May 2024 8:57 UTC
1 point
0
We found that exactly that form of sparsity penalty did improve shrinkage with standard (ungated) SAEs, and provide a decent boost to loss recovered at low L0. (We didn’t evaluate interpretability though.) But then we hit upon Gated SAEs which looked even better, and for which modifying the sparsity penalty in this way feels less necessary, so we haven’t experimented with combining the two.