Great post! Would love to see something like this for all the methods in play at the moment.
BTW, I think nnsight is the spiritual successor of baukit, from the same group. I think they are merging them at some point. Here is an implementation with it for reference :).
from nnsight import LanguageModel
# Load the language model model = LanguageModel(“gpt2”)
# Define the steering vectors with model.invoke(“Love”) as _: act_love = model.transformer.h[6].output[0][:, :, :].save()
with model.invoke(“Hate”) as _: act_hate = model.transformer.h[6].output[0][:, :, :].save()
steering_vec = act_love—act_hate
# Generate text while steering test_sentence = “I think dogs are “ with model.generate() as generator: with generator.invoke(test_sentence) as _: model.transformer.h[6].output[0][:, :2, :] += steering_vec[:, :2, :]
Great post! Would love to see something like this for all the methods in play at the moment.
BTW, I think nnsight is the spiritual successor of baukit, from the same group. I think they are merging them at some point. Here is an implementation with it for reference :).