As you’ll read in the sequel (which we’ll post later today), in GPT2-xl, the anomalous tokens tend to be as far from the origin as possible. Horizontal axis sis distance from centroid. Upper histograms involve 133 tokens, lower histograms involve 50,257 tokens. Note how the spikes in the upper figures register as small bumps on those below.
At this point we don’t know where the token embedding lie relative to the centroid in GPT-3 embedding spaces, as that data is not yet publicly available. And all the bizarre behaviour we’ve been documenting has been in GPT-3 models (despite discovering the “triggering” tokens in GPT-2/J embedding spaces.
3-shot prompting experiments with GPT2 and J models show that distance from centroid may contribute to anomalous behaviour, but it can’t be the sole cause.
As you’ll read in the sequel (which we’ll post later today), in GPT2-xl, the anomalous tokens tend to be as far from the origin as possible. Horizontal axis sis distance from centroid. Upper histograms involve 133 tokens, lower histograms involve 50,257 tokens. Note how the spikes in the upper figures register as small bumps on those below.
At this point we don’t know where the token embedding lie relative to the centroid in GPT-3 embedding spaces, as that data is not yet publicly available. And all the bizarre behaviour we’ve been documenting has been in GPT-3 models (despite discovering the “triggering” tokens in GPT-2/J embedding spaces.
Thanks! Yes, this is some weird behaviour.
Keep me posted on any updates!
3-shot prompting experiments with GPT2 and J models show that distance from centroid may contribute to anomalous behaviour, but it can’t be the sole cause.