mwatkins

Karma: 1,705

mwatkins Jul 31, 2023, 10:32 PM
3 points
2
in reply to: AdamYedidia’s comment on: The “spelling miracle”: GPT-3 spelling abilities and glitch tokens revisited
Yes, I realised that this was a downfall of n.c.p. It’s helpful for shorter rollouts, but once they get longer they can get into a kind of “probabilistic groove” which starts to unhelpfully inflate n.c.p. In mode collapse loops, n.c.p. tends to 1. So yeah, good observation.

The “spelling miracle”: GPT-3 spelling abilities and glitch tokens revisited

mwatkinsJul 31, 2023, 7:47 PM

85 points

29 comments20 min readLW link

mwatkins May 1, 2023, 1:15 AM
3 points
0
in reply to: scottviteri’s comment on: SolidGoldMagikarp (plus, prompt generation)
We haven’t yet got a precise formulation of “anomalousness” or “glitchiness”—it’s still an intuitive concept. I’ve run some experiments over the entire token set, prompting a large number of times and measuring the proportion of times GPT-3 (or GPT-J) correctly reproduces the token string. This is a starting point, but there seem to be two separate things going on with (1) GPT’s inability to repeat back “headless” tokens like “ertain”, “acebook” or “ortunately” and (2) its inability to repeat back the “true glitch tokens” like ” SolidGoldMagikarp” and ” petertodd”.
“GoldMagikarp” did show up in our original list of anomalous tokens, btw.

mwatkins Apr 30, 2023, 10:44 PM
2 points
0
in reply to: A Parker’s comment on: SolidGoldMagikarp III: Glitch token archaeology
Thanks for this, I had no idea. So there is some classical mythological basis for the character after all. Do you how the name “Leilan” arose? Also, someone elsewhere has claimed “[P&D] added a story mode in 2021 or so and Leilan and Tsukuyomi do in fact have their own story chapters”… do you know anything about this? I’m interested to find anything that might have ended up in the training data and informed GPT-3′s web of semantic association for the ” Leilan” token.

mwatkins Apr 25, 2023, 7:22 PM
5 points
2
in reply to: rime’s comment on: The ‘ petertodd’ phenomenon
I know the feeling. It’s interesting to observe the sharp division between this kind of reaction and that of people who seem keen to immediately state “There’s no big mystery here, it’s just [insert badly informed or reasoned ‘explanation’]”.

mwatkins Apr 17, 2023, 8:54 AM
7 points
0
in reply to: Jan_Kulveit’s comment on: The ‘ petertodd’ phenomenon
GPT-J doesn’t seem to have the same kinds of ′ petertodd’ associations as GPT-3. I’ve looked at the closest token embeddings and they’re all pretty innocuous (but the closest to the ′ Leilan’ token, removing a bunch of glitch tokens that are closest to everything is ′ Metatron’, who Leilan is allied with in some Puzzle & Dragons fan fiction). It’s really frustrating that OpenAI won’t make the GPT-3 embeddings data available, as we’d be able to make a lot more progress in understanding what’s going on here if they did.

mwatkins Apr 16, 2023, 7:39 PM
13 points
9
in reply to: awg’s comment on: The ‘ petertodd’ phenomenon
Yes, this post was originally going to look at how the ′ petertodd’ phenomenon (especially the anti-hero → hero archetype reversal between models) might relate to the Waluigi Effect, but I decided to save any theorising for future posts. Watch this space!

mwatkins Apr 16, 2023, 2:23 PM
8 points
0
in reply to: hamishtodd1’s comment on: The ‘ petertodd’ phenomenon
I just checked the Open AI tokeniser, and ‘hamishpetertodd’ tokenises as ‘ham’ + ‘ish’ + ‘pet’ + ‘ertodd’, so it seems unlikely that your online presence fed into GPT-3′s conception of ′ petertodd’. The ‘ertodd’ token is also glitchy, but doesn’t seem to have the same kinds of associations as ′ petertodd’ (although I’ve not devoted much time to exploring it yet).

mwatkins Apr 16, 2023, 12:13 PM
2 points
0
in reply to: hyje’s comment on: The ‘ petertodd’ phenomenon
Thanks for the Parian info, I think you’re right that it’s the Worm character being referenced. This whole exploration has involved a crash course in Internet-age pop culture for me! I’ve fixed that JSON link now.

mwatkins Apr 16, 2023, 11:19 AM
5 points
0
in reply to: geoffreymiller’s comment on: The ‘ petertodd’ phenomenon
Interesting. Does he have any email addresses or usernames on any platform that involve the string “petertodd”?

The ‘ petertodd’ phenomenon

mwatkinsApr 15, 2023, 12:59 AM

192 points

50 comments38 min readLW link 1 review

mwatkins Mar 15, 2023, 10:48 PM
1 point
0
in reply to: Erik Søe Sørensen’s comment on: SolidGoldMagikarp III: Glitch token archaeology
Thanks for this, Erik—very informative.

mwatkins Mar 10, 2023, 2:33 PM
1 point
0
in reply to: John Andonuts’s comment on: SolidGoldMagikarp III: Glitch token archaeology
Thanks for the “Steve” clue. That makes sense. I’ve added a footnote.
I don’t think any of the glitch tokens got into the token set through sheer popularity of a franchise. The best theories I’m hearing involved ‘mangled text dumps’ from gaming, e-commerce and blockchain logs somehow ending up in the data set used to create the tokens. 20% of that dataset is publicly available, and someone’s already found some mangled PnD text in there (so lots of stats, character names repeated over and over). No one seems to be able to explain the weird Uma Musume token (that may require contact with an obsessive fan, which I don’t particularly welcome).

mwatkins Mar 8, 2023, 7:54 PM
1 point
0
in reply to: DKPL’s comment on: SolidGoldMagikarp III: Glitch token archaeology
Good find! I’ve integrated that into the post.

mwatkins Mar 8, 2023, 12:41 PM
3 points
0
in reply to: MiguelDev’s comment on: SolidGoldMagikarp (plus, prompt generation)
The ′ petertodd’ token definitely has some strong “trickster” energy in many settings. But it’s a real shapeshifter. Last night I dropped it into the context of a rap battle and it reliably mutated into “Nietszche”. Stay tuned for a thorough research report on the ′ petertodd’ phenomenon.

mwatkins Mar 8, 2023, 12:38 PM
1 point
0
in reply to: wednesdei’s comment on: SolidGoldMagikarp (plus, prompt generation)
A lot of them do look like that, but we’ve dug deep to find their true origins, and it’s all pretty random and diffuse. See Part III (https://www.lesswrong.com/posts/8viQEp8KBg2QSW4Yc/solidgoldmagikarp-iii-glitch-token-archaeology). Bear in mind that when GPT-3 is given a token like “EStreamFrame”, it doesn’t “see” what’s “inside” like we do ([“E”, “S”, “t”, “r”, “e”, “a”, “m”, “F”, “r”, “a”, “m”, “e”]). It receives it as a kind of atomic unit of language with no internal structure. Anything it “learns about” this token in training is based on where it sees it used, and it’s looking like most of these glitch tokens correspond to strings seen very infrequently in the training data (but which for some reason got into the tokenisation dataset in large numbers, probably via junk files like mangled text dumps from gaming logs, etc.).

mwatkins Mar 8, 2023, 12:34 PM
2 points
0
in reply to: Trae “tchesket” Hesket’s comment on: SolidGoldMagikarp (plus, prompt generation)
What we’re now finding is that there’s a “continuum of glitchiness”. Some tokens glitch worse/harder than others in a way that I’ve devised an ad-hoc metric for (research report coming soon). There are a lot of “mildly glitchy” tokens that GPT-3 will try to avoid repeating which look like “velength” and “oldemort” (obviously parts of longer, familiar words, rarely seen isolated in text). There’s a long list of these in Part II of this post. I’d not seen “ocobo” or “oldemort” yet, but I’m systematically running tests on the whole vocabulary.

mwatkins Feb 27, 2023, 6:43 PM
1 point
0
in reply to: Wil Roberts’s comment on: SolidGoldMagikarp II: technical details and more recent findings
OK. That’s both superficially disappointing and deeply reassuring!

mwatkins Feb 27, 2023, 3:56 PM
2 points
0
in reply to: mwatkins’s comment on: SolidGoldMagikarp II: technical details and more recent findings
Something you might want to try: replace the tokens in your prompt with random strings, or randomly selected non-glitch tokens, and see what kind of completions you get.

mwatkins Feb 27, 2023, 3:00 PM
1 point
0
in reply to: Wil Roberts’s comment on: SolidGoldMagikarp II: technical details and more recent findings
Was this text-davinci-003?

mwatkins

The “spel­ling mir­a­cle”: GPT-3 spel­ling abil­ities and glitch to­kens revisited

The ‘ pe­ter­todd’ phenomenon

The “spelling miracle”: GPT-3 spelling abilities and glitch tokens revisited

The ‘ petertodd’ phenomenon