Thanks for the great post, I really enjoyed reading it!
I love this research direction combining unsupervised method with steering vector, looking forward to your next findings.
Just a quick question : in the conversation you have in the red teaming section, is the learned vector applied to every token generated during the conversation ?
Thanks for the great post, I really enjoyed reading it! I love this research direction combining unsupervised method with steering vector, looking forward to your next findings. Just a quick question : in the conversation you have in the red teaming section, is the learned vector applied to every token generated during the conversation ?
Yes, the learned vectors are always applied at every token (for all examples).