Great work! I have been working on something very similar and will publish my results here some time next week, but can already give a sneak-peak:
The SAEs here were only trained for 100M tokens (1/3 the TinyStories[11:1] dataset). The language model was trained for 3 epochs on the 300M token TinyStories dataset. It would be good to validate these results with more ‘real’ language models and train SAEs with much more data.
I can confirm that on Gemma-2-2B Matryoshka SAEs dramatically improve the absorption score on the first-letter task from Chanin et al. as implemented in SAEBench!
Is there a nice way to extend the Matryoshka method to top-k SAEs?
Yes! My experiments with Matryoshka SAEs are using BatchTopK.
Are you planning to continue this line of research? If so, I would be interested to collaborate (or otherwise at least coordinate on not doing duplicate work).
That’s very cool, I’m looking forward to seeing those results! The Top-K extension is particularly interesting, as that was something I wasn’t sure how to approach.
I imagine you’ve explored important directions I haven’t touched like better benchmarking, top-k implementation, and testing on larger models. Having multiple independent validations of an approach also seems valuable.
I’d be interested in continuing this line of research, especially circuits with Matryoshka SAEs. I’d love to hear about what directions you’re thinking of. Would you want to have a call sometime about collaboration or coordination? (I’ll DM you!)
Great work! I have been working on something very similar and will publish my results here some time next week, but can already give a sneak-peak:
I can confirm that on Gemma-2-2B Matryoshka SAEs dramatically improve the absorption score on the first-letter task from Chanin et al. as implemented in SAEBench!
Yes! My experiments with Matryoshka SAEs are using BatchTopK.
Are you planning to continue this line of research? If so, I would be interested to collaborate (or otherwise at least coordinate on not doing duplicate work).
That’s very cool, I’m looking forward to seeing those results! The Top-K extension is particularly interesting, as that was something I wasn’t sure how to approach.
I imagine you’ve explored important directions I haven’t touched like better benchmarking, top-k implementation, and testing on larger models. Having multiple independent validations of an approach also seems valuable.
I’d be interested in continuing this line of research, especially circuits with Matryoshka SAEs. I’d love to hear about what directions you’re thinking of. Would you want to have a call sometime about collaboration or coordination? (I’ll DM you!)
Really looking forward to reading your post!