Stuart_Armstrong comments on SolidGoldMagikarp (plus, prompt generation)

Stuart_Armstrong 6 Feb 2023 13:10 UTC
LW: 3 AF: 2
0
AF
As we discussed, I feel that the tokens were added for some reason but then not trained on; hence why they are close to the origin, and why the algorithm goes wrong on them, because it just isn’t trained on them at all.

Good work on this post.
- mwatkins 6 Feb 2023 17:26 UTC
  7 points
  3
  Parent
  As you’ll read in the sequel (which we’ll post later today), in GPT2-xl, the anomalous tokens tend to be as far from the origin as possible. Horizontal axis sis distance from centroid. Upper histograms involve 133 tokens, lower histograms involve 50,257 tokens. Note how the spikes in the upper figures register as small bumps on those below.
  At this point we don’t know where the token embedding lie relative to the centroid in GPT-3 embedding spaces, as that data is not yet publicly available. And all the bizarre behaviour we’ve been documenting has been in GPT-3 models (despite discovering the “triggering” tokens in GPT-2/J embedding spaces.
  - Stuart_Armstrong 8 Feb 2023 11:46 UTC
    2 points
    0
    Parent
    Thanks! Yes, this is some weird behaviour.
    
    Keep me posted on any updates!
  - mwatkins 6 Feb 2023 17:31 UTC
    1 point
    0
    Parent
    3-shot prompting experiments with GPT2 and J models show that distance from centroid may contribute to anomalous behaviour, but it can’t be the sole cause.