Thanks Neel, keep this coming—even if only once every few years :) You helped me clarify lots of confusion I had about the existing techniques.
I am a huge fan of steering vectors / control vectors, and I would love to see future research showing if they can be linearly combined together to achieve multiple behaviours simultaneously (I made a post about this). I don’t think it’s just “internal work”—I think it’s a hint to the fact that language semantics can be linearised as vector spaces (I hope I will be able to formalise mathematically this intuition).
Here a proposal of a possible ELK solution using that approach.
I’m also pretty interested in combining steering vectors. I think a particularly promising direction is using SAE decoder vectors for this, as SAEs are designed to find feature vectors that independently vary and can be added.
I agree steering vectors are important as evidence for the linear representation hypothesis (though at this point I consider SAEs to be much superior as evidence, and think they’re more interesting to focus on)
Thanks Neel, keep this coming—even if only once every few years :) You helped me clarify lots of confusion I had about the existing techniques.
I am a huge fan of steering vectors / control vectors, and I would love to see future research showing if they can be linearly combined together to achieve multiple behaviours simultaneously (I made a post about this). I don’t think it’s just “internal work”—I think it’s a hint to the fact that language semantics can be linearised as vector spaces (I hope I will be able to formalise mathematically this intuition).
Here a proposal of a possible ELK solution using that approach.
Glad you liked the post!
I’m also pretty interested in combining steering vectors. I think a particularly promising direction is using SAE decoder vectors for this, as SAEs are designed to find feature vectors that independently vary and can be added.
I agree steering vectors are important as evidence for the linear representation hypothesis (though at this point I consider SAEs to be much superior as evidence, and think they’re more interesting to focus on)