Bart Bussmann comments on Matryoshka Sparse Autoencoders

Bart Bussmann 14 Dec 2024 14:27 UTC
LW: 34 AF: 20
2
AF
Great work! I have been working on something very similar and will publish my results here some time next week, but can already give a sneak-peak:
The SAEs here were only trained for 100M tokens (1/3 the TinyStories^[11:1] dataset). The language model was trained for 3 epochs on the 300M token TinyStories dataset. It would be good to validate these results with more ‘real’ language models and train SAEs with much more data.
I can confirm that on Gemma-2-2B Matryoshka SAEs dramatically improve the absorption score on the first-letter task from Chanin et al. as implemented in SAEBench!
Is there a nice way to extend the Matryoshka method to top-k SAEs?
Yes! My experiments with Matryoshka SAEs are using BatchTopK.

Are you planning to continue this line of research? If so, I would be interested to collaborate (or otherwise at least coordinate on not doing duplicate work).
- Noa Nabeshima 16 Dec 2024 3:53 UTC
  LW: 1 AF: 1
  0
  AF Parent
  That’s very cool, I’m looking forward to seeing those results! The Top-K extension is particularly interesting, as that was something I wasn’t sure how to approach.
  
  I imagine you’ve explored important directions I haven’t touched like better benchmarking, top-k implementation, and testing on larger models. Having multiple independent validations of an approach also seems valuable.
  
  I’d be interested in continuing this line of research, especially circuits with Matryoshka SAEs. I’d love to hear about what directions you’re thinking of. Would you want to have a call sometime about collaboration or coordination? (I’ll DM you!)
  
  Really looking forward to reading your post!