Joseph Bloom comments on [Paper] A is for Absorption: Studying Feature Splitting and Absorption in Sparse Autoencoders

Joseph Bloom 25 Sep 2024 15:32 UTC
3 points
2
This thread reminds me that comparing feature absorption in SAEs with tied encoder / decoder weights and in end-to-end SAEs seems like valuable follow up.
- J Bostock 13 Oct 2024 17:45 UTC
  4 points
  0
  Parent
  Another approach would be to use per-token decoder bias as seen in some previous work: https://www.lesswrong.com/posts/P8qLZco6Zq8LaLHe9/tokenized-saes-infusing-per-token-biases But this would only solve it when the absorbing feature is a token. If it’s more abstract then this wouldn’t work as well.
  
  Semi-relatedly, since most (all) of the SAE work since the original paper has gone into untied encoded/decoder weights, we don’t really know whether modern SAE architectures like Jump ReLU or TopK suffer as large of a performance hit as the original SAEs do, especially with the gains from adding token biases.