TurnTrout comments on Steering GPT-2-XL by adding an activation vector

TurnTrout 15 May 2023 16:32 UTC
LW: 6 AF: 5
0
AF
What’s the TL;DR for the Vicuna 13B experiments?
Activation additions work on Vicuna-13B about as well as they work on GPT-2-XL, or perhaps slightly better. GPT-J-6B is harder to work with for some reason.
- TurnTrout 15 May 2023 18:00 UTC
  LW: 13 AF: 6
  0
  AF Parent
  Note that there’s still a market open for how activation additions interact with larger models, it would be nice if it had more liquidity:
  - Martin Randall 19 May 2023 20:47 UTC
    17 points
    9
    Parent
    I added m1,000 in liquidity.
    
    This idea of determining whether a result is “obvious” in advance seems valuable, I hope it catches on.
- cfoster0 15 May 2023 17:30 UTC
  5 points
  0
  Parent
  I wonder if this is related to how GPT-J runs the attention and MLP sublayers in parallel, as opposed to sequentially?