RSS

Glitch Tokens

TagLast edit: 18 Apr 2023 5:31 UTC by CronoDAS

Glitch Tokens are tokens in a language model that cause anomalous output, such as SolidGoldMagikarp.

The ‘ pe­ter­todd’ phenomenon

mwatkins15 Apr 2023 0:59 UTC
192 points
49 comments38 min readLW link

SolidGoldMag­ikarp (plus, prompt gen­er­a­tion)

5 Feb 2023 22:02 UTC
676 points
205 comments12 min readLW link

SolidGoldMag­ikarp III: Glitch to­ken archaeology

14 Feb 2023 10:17 UTC
91 points
32 comments16 min readLW link

′ pe­ter­todd’’s last stand: The fi­nal days of open GPT-3 research

mwatkins22 Jan 2024 18:47 UTC
109 points
16 comments45 min readLW link

Ano­ma­lous to­kens re­veal the origi­nal iden­tities of In­struct models

9 Feb 2023 1:30 UTC
139 points
16 comments9 min readLW link
(generative.ink)

SolidGoldMag­ikarp II: tech­ni­cal de­tails and more re­cent findings

6 Feb 2023 19:09 UTC
111 points
45 comments13 min readLW link

Map­ping the se­man­tic void: Strange go­ings-on in GPT em­bed­ding spaces

mwatkins14 Dec 2023 13:10 UTC
114 points
31 comments14 min readLW link

What’s up with all the non-Mor­mons? Weirdly spe­cific uni­ver­sal­ities across LLMs

mwatkins19 Apr 2024 13:43 UTC
40 points
13 comments27 min readLW link

A New Class of Glitch To­kens—BPE Subto­ken Ar­ti­facts (BSA)

Lao Mein20 Sep 2024 13:13 UTC
37 points
7 comments5 min readLW link

Glitch To­ken Cat­a­log - (Al­most) a Full Clear

Lao Mein21 Sep 2024 12:22 UTC
37 points
3 comments37 min readLW link

Smar­tyHead­erCode: anoma­lous to­kens for GPT3.5 and GPT-4

AdamYedidia15 Apr 2023 22:35 UTC
71 points
18 comments6 min readLW link

Lin­ear en­cod­ing of char­ac­ter-level in­for­ma­tion in GPT-J to­ken embeddings

10 Nov 2023 22:19 UTC
34 points
4 comments28 min readLW link

The “spel­ling mir­a­cle”: GPT-3 spel­ling abil­ities and glitch to­kens revisited

mwatkins31 Jul 2023 19:47 UTC
85 points
29 comments20 min readLW link

No­kens: A po­ten­tial method of in­ves­ti­gat­ing glitch tokens

Hoagy15 Mar 2023 16:23 UTC
21 points
0 comments4 min readLW link

A Search for More ChatGPT /​ GPT-3.5 /​ GPT-4 “Un­speak­able” Glitch Tokens

Martin Fell9 May 2023 14:36 UTC
26 points
9 comments6 min readLW link

LLMs Univer­sally Learn a Fea­ture Rep­re­sent­ing To­ken Fre­quency /​ Rarity

Sean Osier30 Jun 2024 2:48 UTC
12 points
5 comments6 min readLW link
(github.com)

(redacted) Ano­ma­lous to­kens might dis­pro­por­tionately af­fect com­plex lan­guage tasks

nikola15 Jul 2023 0:48 UTC
4 points
0 comments7 min readLW link

An ex­am­i­na­tion of GPT-2′s bor­ing yet effec­tive glitch

MiguelDev18 Apr 2024 5:26 UTC
5 points
3 comments3 min readLW link
No comments.