Great work! Very excited to see work in this direction (In fact, I didn’t know you were working on this, so I’d expressed enthusiasm for MoE SAEs in our recent list of project ideas published just a few days ago!)
Comments:
I’d love to see some geometric analysis of the router. Is it just approximately a down-projection from the encoder features learned by a dense SAE trained on the same activations?
Following Fedus et al., we route to a single expert SAE. It is possible that selecting several experts will improve performance. The computational cost will scale with the number of experts chosen.
If there are some very common features in particular layers (e.g. an ‘attend to BOS’ feature), then restricting one expert to be active at a time will potentially force SAEs to learn common features in every expert.
If there are some very common features in particular layers (e.g. an ‘attend to BOS’ feature), then restricting one expert to be active at a time will potentially force SAEs to learn common features in every expert.
+1 to similar concerns—I would have probably left one expert always on. This should both remove some redundant features.
Hi Lee and Arthur, thanks for the feedback! I agree that routing to a single expert will force redundant features and will experiment with Arthur’s suggestion. I haven’t taken a close look at the router/expert geometry yet but plan to do so soon.
Hi Lee, if I may ask, when you say “geometric analysis” of the router, do you mean analysis of the parameters or activations? Are there any papers that perform the sort of analysis you’d like seen done? Asking from the perspective of someone who understands nns thoroughly but is new to mechinterp.
Both of these seem like interesting directions (I had parameters in mind, but params and activations are too closely linked to ignore one or the other). And I don’t have a super clear idea but something like representational similarity analysis between SwitchSAEs and regular SAEs could be interesting. This is just one possibility of many though. I haven’t thought about it for long enough to be able to list many more, but it feels like a direction with low hanging fruit for sure. For papers, here’s a good place to start for RSA: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3730178/
Great work! Very excited to see work in this direction (In fact, I didn’t know you were working on this, so I’d expressed enthusiasm for MoE SAEs in our recent list of project ideas published just a few days ago!)
Comments:
I’d love to see some geometric analysis of the router. Is it just approximately a down-projection from the encoder features learned by a dense SAE trained on the same activations?
Consider integrating with SAELens.
If there are some very common features in particular layers (e.g. an ‘attend to BOS’ feature), then restricting one expert to be active at a time will potentially force SAEs to learn common features in every expert.
+1 to similar concerns—I would have probably left one expert always on. This should both remove some redundant features.
Hi Lee and Arthur, thanks for the feedback! I agree that routing to a single expert will force redundant features and will experiment with Arthur’s suggestion. I haven’t taken a close look at the router/expert geometry yet but plan to do so soon.
Hi Lee, if I may ask, when you say “geometric analysis” of the router, do you mean analysis of the parameters or activations? Are there any papers that perform the sort of analysis you’d like seen done? Asking from the perspective of someone who understands nns thoroughly but is new to mechinterp.
Both of these seem like interesting directions (I had parameters in mind, but params and activations are too closely linked to ignore one or the other). And I don’t have a super clear idea but something like representational similarity analysis between SwitchSAEs and regular SAEs could be interesting. This is just one possibility of many though. I haven’t thought about it for long enough to be able to list many more, but it feels like a direction with low hanging fruit for sure. For papers, here’s a good place to start for RSA: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3730178/
Thank you very much for your reply—I appreciate the commentary and direction