Clément Dumas comments on Self-explaining SAE features

Clément Dumas 6 Aug 2024 8:43 UTC
5 points
0
Nice post, awesome work and very well presented! I’m also working on similar stuff (using ~selfIE to make the model reason about its own internals) and was wondering, did you try to patch the SAE features 3 times instead of one (xxx instead of x)? This is one of the tricks they use in selfIE.
- Dmitrii Kharlapenko 6 Aug 2024 13:55 UTC
  4 points
  0
  Parent
  Thanks! We did try to use it in the repeat setting to make the model produce more than a single token, but it did not work well.
  
  And as far as I remember it also did not improve the meaning prompt much.