You’re right, however the results from the Steering GPT-2-XL post showed that in GPT-2-XL, similar modifications had very little effect on model perplexity. The patched model also doesn’t only shift weight from b to a. It also has wonky effects on other digits. For example, in the 3-1 patch for input 4, the weight given to 9 very much increased. More interestingly, it is not too uncommon to find examples which cause seemingly random digits to suddenly become the most likely. The 1-8 patch for input 9 is an example:
Yup. You should be able to see this in the chart.
You’re right, however the results from the Steering GPT-2-XL post showed that in GPT-2-XL, similar modifications had very little effect on model perplexity. The patched model also doesn’t only shift weight from
b
toa
. It also has wonky effects on other digits. For example, in the 3-1 patch for input 4, the weight given to 9 very much increased. More interestingly, it is not too uncommon to find examples which cause seemingly random digits to suddenly become the most likely. The 1-8 patch for input 9 is an example: