Nice post, awesome work and very well presented!
I’m also working on similar stuff (using ~selfIE to make the model reason about its own internals) and was wondering, did you try to patch the SAE features 3 times instead of one (xxx instead of x)? This is one of the tricks they use in selfIE.
Nice post, awesome work and very well presented! I’m also working on similar stuff (using ~selfIE to make the model reason about its own internals) and was wondering, did you try to patch the SAE features 3 times instead of one (xxx instead of x)? This is one of the tricks they use in selfIE.
Thanks! We did try to use it in the repeat setting to make the model produce more than a single token, but it did not work well.
And as far as I remember it also did not improve the meaning prompt much.