davidad comments on Steering GPT-2-XL by adding an activation vector

davidad 15 May 2023 13:25 UTC
LW: 15 AF: 7
19
AF
On the object-level, deriving task vectors in weight-space from deltas in fine-tuned checkpoints is really different from what was done here, because it requires doing a lot of backward passes on a lot of data. Deriving task vectors in activation-space, as done in this new work, requires only a single forward pass on a truly tiny amount of data. So the data-efficiency and compute-efficiency of the steering power gained with this new method is orders of magnitude better, in my view.

Also, taking affine combinations in weight-space is not novel to Schmidt et al either. If nothing else, the Stable Diffusion community has been doing that since October to add and subtract capabilities from models.
- Dan H 15 May 2023 14:00 UTC
  LW: 13 AF: 6
  13
  AF Parent
  It’s a good observation that it’s more efficient; does it trade off performance? (These sorts of comparisons would probably be demanded if it was submitted to any other truth-seeking ML venue, and I apologize for consistently being the person applying the pressures that generic academics provide. It would be nice if authors would provide these comparisons.)
  Also, taking affine combinations in weight-space is not novel to Schmidt et al either. If nothing else, the Stable Diffusion community has been doing that since October to add and subtract capabilities from models.
  
  It takes months to write up these works, and since the Schmidt paper was in December, it is not obvious who was first in all senses. The usual standard is to count the time a standard-sized paper first appeared on arXiv, so the most standard sense they are first. (Inside conferences, a paper is considered prior art if it was previously published, not just if it was arXived, but outside most people just keep track of when it was arXived.) Otherwise there are arms race dynamics leading to everyone spamming snippets before doing careful, extensive science.
  - davidad 17 May 2023 18:33 UTC
    LW: 5 AF: 3
    2
    AF Parent
    Some direct quantitative comparison between activation-steering and task-vector-steering (at, say, reducing toxicity) is indeed a very sensible experiment for a peer reviewer to ask for and I would like to see it as well.