Gytis Daujotas comments on Experiments in Evaluating Steering Vectors

Gytis Daujotas 20 Jun 2023 22:05 UTC
1 point
0
Definitely a good point! I wanted to get a rough sense as to whether this evaluation approach would work at all, so I deliberately aimed at trying to be monomaniacal. If I was to continue with this, you’re right—I think figuring out what a human would actually want to see in a completion would be the next step in seeing if this technique can be useful in practice.
For the token probabilities—I was inspired mostly by seeing this used in Ought’s work for factored cognition:
https://github.com/rawmaterials/ice/blob/4493d6198955804cc03069c3f88bda1b23de616f/ice/recipes/experiments_and_arms/prompts/can_name_exps.py#L161
It seems like the misc. token probabilities usually add up to less than 1% of the total probability mass:
https://i.imgur.com/aznsQdr.png