Thanks! That’s the best explanation I’ve yet encountered. There had been previous suggestions that layer norm is a major factor in this phenomenon
mwatkins
I did some spelling evals with GPT2-xl and -small last year, discovered that they’re pretty terrible at spelling! Even with multishot prompting and supplying the first letter, the output seems to be heavily conditioned on that first letter, sometimes affected by the specifics of the prompt, and reminiscent of very crude bigrammatic or trigrammatic spelling algorithms.
This was the prompt (in this case eliciting a spelling for the token ‘that’):
Please spell ‘table’ in all capital letters, separated by hyphens.
T-A-B-L-E
Please spell ‘nice’ in all capital letters, separated by hyphens.
N-I-C-E
Please spell ‘water’ in all capital letters, separated by hyphens.
W-A-T-E-R
Please spell ‘love’ in all capital letters, separated by hyphens.
L-O-V-E
Please spell ‘that’ in all capital letters, separated by hyphens.
T-
Outputs seen, by first letter:‘a’ words; ANIGE, ANIGER, ANICES, ARING
’b’ words: BOWARS, BORSE
’c’ words: CANIS, CARES x 3
′d’ words: DOWER, DONER
’e’ words: EIDSON
’f’ words: FARIES x 5
′g’ words: GODER, GING x 3
′h’ words: HATER x 6, HARIE, HARIES
’i’ words: INGER
’j’ words: JOSER
’k’ words: KARES
’l’ words: LOVER x 5
′n’ words: NOTER x 2, NOVER
’o’ words: ONERS x 5, OTRANG
’p’ words: PARES x 2
′t’ words: TABLE x 10
′u’ words: UNSER
’w’ words: WATER x 6
′y’ words: YOURE, YOUSENote how they’re all “wordy” (in terms of combinations of vowels and consonants), mostly non-words, with a lot of ER and a bit of ING
Reducing to three shots, we see siimilar (but slightly different) misspellings:
CONES, VICER, MONERS, HOTERS, KATERS, FATERS, CANIS, PATERS, GINGE, PINGER, NICERS, SINGER, DONES, LONGER, JONGER, LOUSE, HORSED, EICHING, UNSER, ALEST, BORSET, FORSED, ARINGMy notes claim “Although the overall spelling is pretty terrible, GPT-2xl can do second-letter prediction (given first) considerably better than chance (and significantly better than bigramatically-informed guessing.”
Thanks so much for leaving this comment. I suspected that psychologists or anthropologists might have something to say about this. Do you know anyone actively working in this area who might be interested?
Thanks! I’m starting to get the picture (insofar as that’s possible).
Could you elaborate on the role you think layernorm is playing? You’re not the first person to suggest this, and I’d be interested to explore further. Thanks!
Thanks for the elucidation! This is really helpful and interesting, but I’m still left somewhat confused.
Your concise demonstration immediately convinced me that any Gaussian distributed around a point some distance from the origin in high-dimensional Euclidean space would have the property I observed in the distribution of GPT-J embeddings, i.e. their norms will be normally distributed in a tight band, while their distances-from-centroid will also be normally distributed in a (smaller) tight band. So I can concede that this has nothing to do with where the token embeddings ended up as a result of training GPT-J (as I had imagined) and is instead a general feature of Gaussian distributions in high dimensions.
However, I’m puzzled by “Suddenly it looks like a much smaller shell!”
Don’t these histograms unequivocally indicate the existence of two separate shells with different centres and radii, both of which contain the vast bulk of the points in the distribution? Yes, there’s only one distribution of points, but it still seems like it’s almost entirely contained in the intersection of a pair of distinct hyperspherical shells.
The intended meaning was that the set of points in embedding space corresponding to the 50257 tokens are contained in a particular volume of space (the intersection of two hyperspherical shells).
Thanks for pointing this out! They should work now.
Thank! And in case it wasn’t clear from the article, the tokens whose misspellings are examined in the Appendix are not glitch tokens.
Yes, I realised that this was a downfall of n.c.p. It’s helpful for shorter rollouts, but once they get longer they can get into a kind of “probabilistic groove” which starts to unhelpfully inflate n.c.p. In mode collapse loops, n.c.p. tends to 1. So yeah, good observation.
We haven’t yet got a precise formulation of “anomalousness” or “glitchiness”—it’s still an intuitive concept. I’ve run some experiments over the entire token set, prompting a large number of times and measuring the proportion of times GPT-3 (or GPT-J) correctly reproduces the token string. This is a starting point, but there seem to be two separate things going on with (1) GPT’s inability to repeat back “headless” tokens like “ertain”, “acebook” or “ortunately” and (2) its inability to repeat back the “true glitch tokens” like ” SolidGoldMagikarp” and ” petertodd”.
“GoldMagikarp” did show up in our original list of anomalous tokens, btw.
Thanks for this, I had no idea. So there is some classical mythological basis for the character after all. Do you how the name “Leilan” arose? Also, someone elsewhere has claimed “[P&D] added a story mode in 2021 or so and Leilan and Tsukuyomi do in fact have their own story chapters”… do you know anything about this? I’m interested to find anything that might have ended up in the training data and informed GPT-3′s web of semantic association for the ” Leilan” token.
I know the feeling. It’s interesting to observe the sharp division between this kind of reaction and that of people who seem keen to immediately state “There’s no big mystery here, it’s just [insert badly informed or reasoned ‘explanation’]”.
GPT-J doesn’t seem to have the same kinds of ′ petertodd’ associations as GPT-3. I’ve looked at the closest token embeddings and they’re all pretty innocuous (but the closest to the ′ Leilan’ token, removing a bunch of glitch tokens that are closest to everything is ′ Metatron’, who Leilan is allied with in some Puzzle & Dragons fan fiction). It’s really frustrating that OpenAI won’t make the GPT-3 embeddings data available, as we’d be able to make a lot more progress in understanding what’s going on here if they did.
Yes, this post was originally going to look at how the ′ petertodd’ phenomenon (especially the anti-hero → hero archetype reversal between models) might relate to the Waluigi Effect, but I decided to save any theorising for future posts. Watch this space!
I just checked the Open AI tokeniser, and ‘hamishpetertodd’ tokenises as ‘ham’ + ‘ish’ + ‘pet’ + ‘ertodd’, so it seems unlikely that your online presence fed into GPT-3′s conception of ′ petertodd’. The ‘ertodd’ token is also glitchy, but doesn’t seem to have the same kinds of associations as ′ petertodd’ (although I’ve not devoted much time to exploring it yet).
Thanks for the Parian info, I think you’re right that it’s the Worm character being referenced. This whole exploration has involved a crash course in Internet-age pop culture for me! I’ve fixed that JSON link now.
Interesting. Does he have any email addresses or usernames on any platform that involve the string “petertodd”?
Thanks for this, Erik—very informative.
Quite possibly it does, but I doubt very many of these synonyms are tokens.