Joel Burget comments on Activation additions in a small residual network

Joel Burget 22 May 2023 22:55 UTC
1 point
0
I have a couple of basic questions:
1. Shouldn’t diagonal elements in the perplexity table all be equal to the baseline (since the addition should be 0)?
2. I’m a bit confused about the use of perplexity here. The added vector introduces bias (away from one digit and towards another). It shouldn’t be surprising that perplexity increases? Eyeballing the visualizations they do all seem to shift mass away from b and towards a.
- Garrett Baker 22 May 2023 23:10 UTC
  2 points
  0
  Parent
  1. Yup. You should be able to see this in the chart.
  2. You’re right, however the results from the Steering GPT-2-XL post showed that in GPT-2-XL, similar modifications had very little effect on model perplexity. The patched model also doesn’t only shift weight from b to a. It also has wonky effects on other digits. For example, in the 3-1 patch for input 4, the weight given to 9 very much increased. More interestingly, it is not too uncommon to find examples which cause seemingly random digits to suddenly become the most likely. The 1-8 patch for input 9 is an example: