Daniel Salami comments on Steering GPT-2-XL by adding an activation vector

Daniel Salami 15 May 2023 15:20 UTC
2 points
0
This seems somewhat related to this article but I came across this paper (Human Shared AI control via Policy Dissection) which uses neural frequency analysis of behaviours from an rl policy to control the agents actions. I am wondering if the same thing can be done with language models. Maybe this same technique can also be useful in finding vectors that do specific things.