David Reber comments on Steering GPT-2-XL by adding an activation vector

David Reber 25 May 2023 17:10 UTC
LW: 4 AF: 1
0
AF
Another related work: Concept Algebra for Text-Controlled Vision Models (Discloser: while I did not author this paper, I am in the PhD lab who did, under Victor Veitch at UChicago. Any mistakes made in this comment are my own). We haven’t prioritized a blog post about the paper so it makes sense that this community isn’t familiar with it.
The concept algebra paper demonstrates that for text-to-image models like Stable Diffusion, there exist linear subspaces in the score embedding space, on which you can do the same manner of concept editing/control as Word-to-Vec.
Importantly, the paper comes with some theoretical investigation into why this might be the case, including articulating necessary assumptions/conditions (which this purely-empirical post does not).
I conjecture that the reason that <some activation additions in this post fail to have the desired effect> may be because they violate some conditions analogous to those in Concept Algebra: it feels a bit deja-vu to look at section E.1 in the appendix, of some empirical results which fail to act as expected when the conditions of completeness and causal separability don’t hold.
- Bogdan Ionut Cirstea 4 Jun 2023 21:22 UTC
  3 points
  0
  Parent
  Seems very related: Linear Spaces of Meanings: Compositional Structures in Vision-Language Models. Notably, the (approximate) compositionality of language/reality should bode well for the scalability of linear activation engineering methods.
  - Bogdan Ionut Cirstea 6 Jun 2023 18:54 UTC
    1 point
    0
    Parent
    And this structure can be used as regularization for soft prompts.