RSS

Glitch Tokens

TagLast edit: Apr 18, 2023, 5:31 AM by CronoDAS

Glitch Tokens are tokens in a language model that cause anomalous output, such as SolidGoldMagikarp.

The ‘ pe­ter­todd’ phenomenon

mwatkinsApr 15, 2023, 12:59 AM
192 points
50 comments38 min readLW link1 review

SolidGoldMag­ikarp (plus, prompt gen­er­a­tion)

Feb 5, 2023, 10:02 PM
679 points
206 comments12 min readLW link1 review

SolidGoldMag­ikarp III: Glitch to­ken archaeology

Feb 14, 2023, 10:17 AM
91 points
35 comments16 min readLW link

′ pe­ter­todd’’s last stand: The fi­nal days of open GPT-3 research

mwatkinsJan 22, 2024, 6:47 PM
109 points
16 comments45 min readLW link

Ano­ma­lous to­kens re­veal the origi­nal iden­tities of In­struct models

Feb 9, 2023, 1:30 AM
139 points
16 comments9 min readLW link
(generative.ink)

SolidGoldMag­ikarp II: tech­ni­cal de­tails and more re­cent findings

Feb 6, 2023, 7:09 PM
113 points
45 comments13 min readLW link

Ex­plor­ing the pe­ter­todd /​ Leilan du­al­ity in GPT-2 and GPT-J

mwatkinsDec 23, 2024, 1:17 PM
12 points
1 comment17 min readLW link

What’s up with all the non-Mor­mons? Weirdly spe­cific uni­ver­sal­ities across LLMs

mwatkinsApr 19, 2024, 1:43 PM
40 points
13 comments27 min readLW link

A New Class of Glitch To­kens—BPE Subto­ken Ar­ti­facts (BSA)

Lao MeinSep 20, 2024, 1:13 PM
37 points
7 comments5 min readLW link

Glitch To­ken Cat­a­log - (Al­most) a Full Clear

Lao MeinSep 21, 2024, 12:22 PM
38 points
3 comments37 min readLW link

Map­ping the se­man­tic void: Strange go­ings-on in GPT em­bed­ding spaces

mwatkinsDec 14, 2023, 1:10 PM
114 points
31 comments14 min readLW link

Smar­tyHead­erCode: anoma­lous to­kens for GPT3.5 and GPT-4

AdamYedidiaApr 15, 2023, 10:35 PM
71 points
18 comments6 min readLW link

Lin­ear en­cod­ing of char­ac­ter-level in­for­ma­tion in GPT-J to­ken embeddings

Nov 10, 2023, 10:19 PM
34 points
4 comments28 min readLW link

The “spel­ling mir­a­cle”: GPT-3 spel­ling abil­ities and glitch to­kens revisited

mwatkinsJul 31, 2023, 7:47 PM
85 points
29 comments20 min readLW link

No­kens: A po­ten­tial method of in­ves­ti­gat­ing glitch tokens

HoagyMar 15, 2023, 4:23 PM
21 points
0 comments4 min readLW link

A Search for More ChatGPT /​ GPT-3.5 /​ GPT-4 “Un­speak­able” Glitch Tokens

Martin FellMay 9, 2023, 2:36 PM
26 points
9 comments6 min readLW link

LLMs Univer­sally Learn a Fea­ture Rep­re­sent­ing To­ken Fre­quency /​ Rarity

Sean OsierJun 30, 2024, 2:48 AM
12 points
5 comments6 min readLW link
(github.com)

Ano­ma­lous To­kens in Deep­Seek-V3 and r1

henryJan 25, 2025, 10:55 PM
130 points
2 comments7 min readLW link

(redacted) Ano­ma­lous to­kens might dis­pro­por­tionately af­fect com­plex lan­guage tasks

Nikola JurkovicJul 15, 2023, 12:48 AM
4 points
0 comments7 min readLW link

An ex­am­i­na­tion of GPT-2′s bor­ing yet effec­tive glitch

MiguelDevApr 18, 2024, 5:26 AM
5 points
3 comments3 min readLW link
No comments.