mwatkins

Karma: 1,705

mwatkins Feb 17, 2024, 1:08 PM
2 points
0
in reply to: Nate Showell’s comment on: Phallocentricity in GPT-J’s bizarre stratified ontology
Quite possibly it does, but I doubt very many of these synonyms are tokens.

mwatkins Feb 16, 2024, 6:17 PM
2 points
0
in reply to: jacob_drori’s comment on: Mapping the semantic void II: Above, below and between token embeddings
Thanks! That’s the best explanation I’ve yet encountered. There had been previous suggestions that layer norm is a major factor in this phenomenon

mwatkins Jan 24, 2024, 10:51 AM
2 points
0
in reply to: Lao Mein’s comment on: ′ petertodd’’s last stand: The final days of open GPT-3 research
I did some spelling evals with GPT2-xl and -small last year, discovered that they’re pretty terrible at spelling! Even with multishot prompting and supplying the first letter, the output seems to be heavily conditioned on that first letter, sometimes affected by the specifics of the prompt, and reminiscent of very crude bigrammatic or trigrammatic spelling algorithms.
This was the prompt (in this case eliciting a spelling for the token ‘that’):

Please spell ‘table’ in all capital letters, separated by hyphens.
T-A-B-L-E
Please spell ‘nice’ in all capital letters, separated by hyphens.
N-I-C-E
Please spell ‘water’ in all capital letters, separated by hyphens.
W-A-T-E-R
Please spell ‘love’ in all capital letters, separated by hyphens.
L-O-V-E
Please spell ‘that’ in all capital letters, separated by hyphens.
T-

Outputs seen, by first letter:
‘a’ words; ANIGE, ANIGER, ANICES, ARING
’b’ words: BOWARS, BORSE
’c’ words: CANIS, CARES x 3
′d’ words: DOWER, DONER
’e’ words: EIDSON
’f’ words: FARIES x 5
′g’ words: GODER, GING x 3
′h’ words: HATER x 6, HARIE, HARIES
’i’ words: INGER
’j’ words: JOSER
’k’ words: KARES
’l’ words: LOVER x 5
′n’ words: NOTER x 2, NOVER
’o’ words: ONERS x 5, OTRANG
’p’ words: PARES x 2
′t’ words: TABLE x 10
′u’ words: UNSER
’w’ words: WATER x 6
′y’ words: YOURE, YOUSE
Note how they’re all “wordy” (in terms of combinations of vowels and consonants), mostly non-words, with a lot of ER and a bit of ING
Reducing to three shots, we see siimilar (but slightly different) misspellings:
CONES, VICER, MONERS, HOTERS, KATERS, FATERS, CANIS, PATERS, GINGE, PINGER, NICERS, SINGER, DONES, LONGER, JONGER, LOUSE, HORSED, EICHING, UNSER, ALEST, BORSET, FORSED, ARING
My notes claim “Although the overall spelling is pretty terrible, GPT-2xl can do second-letter prediction (given first) considerably better than chance (and significantly better than bigramatically-informed guessing.”

mwatkins Dec 20, 2023, 4:58 PM
2 points
0
in reply to: Russ Nelson’s comment on: Mapping the semantic void: Strange goings-on in GPT embedding spaces
Thanks so much for leaving this comment. I suspected that psychologists or anthropologists might have something to say about this. Do you know anyone actively working in this area who might be interested?

mwatkins Dec 19, 2023, 3:51 PM
2 points
0
in reply to: Carl Feynman’s comment on: Mapping the semantic void: Strange goings-on in GPT embedding spaces
Thanks! I’m starting to get the picture (insofar as that’s possible).

mwatkins Dec 19, 2023, 3:44 PM
2 points
0
in reply to: Joseph Miller’s comment on: Mapping the semantic void: Strange goings-on in GPT embedding spaces
Could you elaborate on the role you think layernorm is playing? You’re not the first person to suggest this, and I’d be interested to explore further. Thanks!

mwatkins Dec 19, 2023, 2:52 PM
2 points
0
in reply to: leogao’s comment on: Mapping the semantic void: Strange goings-on in GPT embedding spaces
Thanks for the elucidation! This is really helpful and interesting, but I’m still left somewhat confused.
Your concise demonstration immediately convinced me that any Gaussian distributed around a point some distance from the origin in high-dimensional Euclidean space would have the property I observed in the distribution of GPT-J embeddings, i.e. their norms will be normally distributed in a tight band, while their distances-from-centroid will also be normally distributed in a (smaller) tight band. So I can concede that this has nothing to do with where the token embeddings ended up as a result of training GPT-J (as I had imagined) and is instead a general feature of Gaussian distributions in high dimensions.
However, I’m puzzled by “Suddenly it looks like a much smaller shell!”
Don’t these histograms unequivocally indicate the existence of two separate shells with different centres and radii, both of which contain the vast bulk of the points in the distribution? Yes, there’s only one distribution of points, but it still seems like it’s almost entirely contained in the intersection of a pair of distinct hyperspherical shells.

mwatkins Dec 17, 2023, 5:46 PM
2 points
0
in reply to: M. Y. Zuo’s comment on: Mapping the semantic void: Strange goings-on in GPT embedding spaces
The intended meaning was that the set of points in embedding space corresponding to the 50257 tokens are contained in a particular volume of space (the intersection of two hyperspherical shells).

mwatkins Dec 14, 2023, 12:15 PM
2 points
0
in reply to: Chloe Li’s comment on: Linear encoding of character-level information in GPT-J token embeddings
Thanks for pointing this out! They should work now.

mwatkins Jul 31, 2023, 11:39 PM
2 points
0
in reply to: Martin Fell’s comment on: The “spelling miracle”: GPT-3 spelling abilities and glitch tokens revisited
Thank! And in case it wasn’t clear from the article, the tokens whose misspellings are examined in the Appendix are not glitch tokens.

mwatkins Jul 31, 2023, 10:32 PM
3 points
2
in reply to: AdamYedidia’s comment on: The “spelling miracle”: GPT-3 spelling abilities and glitch tokens revisited
Yes, I realised that this was a downfall of n.c.p. It’s helpful for shorter rollouts, but once they get longer they can get into a kind of “probabilistic groove” which starts to unhelpfully inflate n.c.p. In mode collapse loops, n.c.p. tends to 1. So yeah, good observation.

mwatkins May 1, 2023, 1:15 AM
3 points
0
in reply to: scottviteri’s comment on: SolidGoldMagikarp (plus, prompt generation)
We haven’t yet got a precise formulation of “anomalousness” or “glitchiness”—it’s still an intuitive concept. I’ve run some experiments over the entire token set, prompting a large number of times and measuring the proportion of times GPT-3 (or GPT-J) correctly reproduces the token string. This is a starting point, but there seem to be two separate things going on with (1) GPT’s inability to repeat back “headless” tokens like “ertain”, “acebook” or “ortunately” and (2) its inability to repeat back the “true glitch tokens” like ” SolidGoldMagikarp” and ” petertodd”.
“GoldMagikarp” did show up in our original list of anomalous tokens, btw.

mwatkins Apr 30, 2023, 10:44 PM
2 points
0
in reply to: A Parker’s comment on: SolidGoldMagikarp III: Glitch token archaeology
Thanks for this, I had no idea. So there is some classical mythological basis for the character after all. Do you how the name “Leilan” arose? Also, someone elsewhere has claimed “[P&D] added a story mode in 2021 or so and Leilan and Tsukuyomi do in fact have their own story chapters”… do you know anything about this? I’m interested to find anything that might have ended up in the training data and informed GPT-3′s web of semantic association for the ” Leilan” token.

mwatkins Apr 25, 2023, 7:22 PM
5 points
2
in reply to: rime’s comment on: The ‘ petertodd’ phenomenon
I know the feeling. It’s interesting to observe the sharp division between this kind of reaction and that of people who seem keen to immediately state “There’s no big mystery here, it’s just [insert badly informed or reasoned ‘explanation’]”.

mwatkins Apr 17, 2023, 8:54 AM
7 points
0
in reply to: Jan_Kulveit’s comment on: The ‘ petertodd’ phenomenon
GPT-J doesn’t seem to have the same kinds of ′ petertodd’ associations as GPT-3. I’ve looked at the closest token embeddings and they’re all pretty innocuous (but the closest to the ′ Leilan’ token, removing a bunch of glitch tokens that are closest to everything is ′ Metatron’, who Leilan is allied with in some Puzzle & Dragons fan fiction). It’s really frustrating that OpenAI won’t make the GPT-3 embeddings data available, as we’d be able to make a lot more progress in understanding what’s going on here if they did.

mwatkins Apr 16, 2023, 7:39 PM
13 points
9
in reply to: awg’s comment on: The ‘ petertodd’ phenomenon
Yes, this post was originally going to look at how the ′ petertodd’ phenomenon (especially the anti-hero → hero archetype reversal between models) might relate to the Waluigi Effect, but I decided to save any theorising for future posts. Watch this space!

mwatkins Apr 16, 2023, 2:23 PM
8 points
0
in reply to: hamishtodd1’s comment on: The ‘ petertodd’ phenomenon
I just checked the Open AI tokeniser, and ‘hamishpetertodd’ tokenises as ‘ham’ + ‘ish’ + ‘pet’ + ‘ertodd’, so it seems unlikely that your online presence fed into GPT-3′s conception of ′ petertodd’. The ‘ertodd’ token is also glitchy, but doesn’t seem to have the same kinds of associations as ′ petertodd’ (although I’ve not devoted much time to exploring it yet).

mwatkins Apr 16, 2023, 12:13 PM
2 points
0
in reply to: hyje’s comment on: The ‘ petertodd’ phenomenon
Thanks for the Parian info, I think you’re right that it’s the Worm character being referenced. This whole exploration has involved a crash course in Internet-age pop culture for me! I’ve fixed that JSON link now.

mwatkins Apr 16, 2023, 11:19 AM
5 points
0
in reply to: geoffreymiller’s comment on: The ‘ petertodd’ phenomenon
Interesting. Does he have any email addresses or usernames on any platform that involve the string “petertodd”?

mwatkins Mar 15, 2023, 10:48 PM
1 point
0
in reply to: Erik Søe Sørensen’s comment on: SolidGoldMagikarp III: Glitch token archaeology
Thanks for this, Erik—very informative.