Lee Sharkey comments on Efficient Dictionary Learning with Switch Sparse Autoencoders

Lee Sharkey 23 Jul 2024 10:46 UTC
5 points
0
Great work! Very excited to see work in this direction (In fact, I didn’t know you were working on this, so I’d expressed enthusiasm for MoE SAEs in our recent list of project ideas published just a few days ago!)

Comments:
- I’d love to see some geometric analysis of the router. Is it just approximately a down-projection from the encoder features learned by a dense SAE trained on the same activations?
- Consider integrating with SAELens.
Following Fedus et al., we route to a single expert SAE. It is possible that selecting several experts will improve performance. The computational cost will scale with the number of experts chosen.
- If there are some very common features in particular layers (e.g. an ‘attend to BOS’ feature), then restricting one expert to be active at a time will potentially force SAEs to learn common features in every expert.
- Arthur Conmy 23 Jul 2024 11:06 UTC
  7 points
  0
  Parent
  If there are some very common features in particular layers (e.g. an ‘attend to BOS’ feature), then restricting one expert to be active at a time will potentially force SAEs to learn common features in every expert.
  +1 to similar concerns—I would have probably left one expert always on. This should both remove some redundant features.
  - Anish Mudide 23 Jul 2024 20:32 UTC
    5 points
    0
    Parent
    Hi Lee and Arthur, thanks for the feedback! I agree that routing to a single expert will force redundant features and will experiment with Arthur’s suggestion. I haven’t taken a close look at the router/expert geometry yet but plan to do so soon.
- phenomanon 26 Jul 2024 0:16 UTC
  2 points
  0
  Parent
  Hi Lee, if I may ask, when you say “geometric analysis” of the router, do you mean analysis of the parameters or activations? Are there any papers that perform the sort of analysis you’d like seen done? Asking from the perspective of someone who understands nns thoroughly but is new to mechinterp.
  - Lee Sharkey 26 Jul 2024 11:06 UTC
    3 points
    0
    Parent
    Both of these seem like interesting directions (I had parameters in mind, but params and activations are too closely linked to ignore one or the other). And I don’t have a super clear idea but something like representational similarity analysis between SwitchSAEs and regular SAEs could be interesting. This is just one possibility of many though. I haven’t thought about it for long enough to be able to list many more, but it feels like a direction with low hanging fruit for sure. For papers, here’s a good place to start for RSA: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3730178/
    - phenomanon 26 Jul 2024 22:19 UTC
      1 point
      0
      Parent
      Thank you very much for your reply—I appreciate the commentary and direction