Martin Fell

Karma: 110

Martin Fell May 10, 2023, 12:15 AM
2 points
0
in reply to: Joseph Van Name’s comment on: A Search for More ChatGPT / GPT-3.5 / GPT-4 “Unspeakable” Glitch Tokens
Since it seems that glitch tokens are caused by certain sequences of text appearing in the training corpus for the tokenizer much more often than they do in the LLM training data, something like that might work. But there also seem to exist “glitch phrases” or “unspeakable phrases”, i.e. sequences of tokens of extremely low probability to the model that could create some strange behaviour too, and it seems at least plausible to me that these kinds of phrases could still be generated even if countermeasures were taken to prevent glitch tokens from being created. Glitch phrases though are a bit more difficult to find without access to the model.

Martin Fell May 9, 2023, 6:23 PM
1 point
0
in reply to: Derek M. Jones’s comment on: A Search for More ChatGPT / GPT-3.5 / GPT-4 “Unspeakable” Glitch Tokens
Thanks, appreciate the suggestion, there’s definitely a lot of room to go into more depth and I’ll definitely check that out

Martin Fell May 9, 2023, 3:06 PM
1 point
0
in reply to: Derek M. Jones’s comment on: A Search for More ChatGPT / GPT-3.5 / GPT-4 “Unspeakable” Glitch Tokens
Thanks, I’ll rephrase that part for clarity

A Search for More ChatGPT / GPT-3.5 / GPT-4 “Unspeakable” Glitch Tokens

Martin FellMay 9, 2023, 2:36 PM

26 points

9 comments6 min readLW link

Martin Fell Apr 20, 2023, 10:41 PM
8 points
0
in reply to: Martin Fell’s comment on: SmartyHeaderCode: anomalous tokens for GPT3.5 and GPT-4
In case anyone is interested or finds them useful, I did a bit more of a search for current ChatGPT glitch tokens from tokens 86000 to 96000 and found quite a few more, the ones listed below were the most extreme. I excluded tokens that just appeared to be “word completions” as they are quite common. Note the three in a row:
Token: 89473
″useRalativeImagePath”
Token: 89472
″useRalative”
Token: 89471
″useRal”
Token: 87914
″ YYSTACK”
Token: 87551
″CppGuid”
Token: 86415
″BundleOrNil”
Token: 86393
″ PropelException”
Token: 93905
″ QtAws”
Token: 93304
″VertexUvs”
Token: 92103
″NavigatorMove”
Token: 94823
″textTheme”
Token: 94652
″BracketAccess”
Token: 95812
” RTCK”
(initial character is a tab)
Token: 97736
″ RTCT”
(initial character is a tab)
Token: 97784
″ JSBracketAccess”
Some of the more interesting responses I got during the search:

And I even got some spontaneous humour from ChatGPT:
Also worth noting that after testing several of these, they do seem to work on Bing too, which makes a lot of sense.

Martin Fell Apr 17, 2023, 9:37 PM
1 point
0
in reply to: segfault ’s comment on: SmartyHeaderCode: anomalous tokens for GPT3.5 and GPT-4
The tokens themselves are public, but not the actual embedding matrix/vectors (as far as I know)

Martin Fell Apr 16, 2023, 9:15 PM
9 points
0
on: SmartyHeaderCode: anomalous tokens for GPT3.5 and GPT-4
Just out of curiosity I searched manually through tokens 96000 − 97999, I did find quite a few “word suffix” tokens, e.g. “oralType” which ChatGPT 3.5 always completes to “TemporalType”. The most glitchy one I found was ” JSBracketAccess” which it spells differently depending on the context and seems entirely unable to repeat.
(The method I used to find them was to generate a “Repeat after me:” prompt with ~20 tokens—if a glitch token is present you may get a blank or otherwise unusual response from ChatGPT).

Martin Fell Mar 24, 2023, 12:31 PM
5 points
1
on: Using GPT-4 to Understand Code
I’ve also found generating exercises from text to be particularly useful, even to just make you think more about what you’re reading. Also found this useful when learning new tools, e.g. generating a load of einsum / einops exercises which didn’t even require pasting in any additional text. Using it to summarize code sounds interesting and not something I’ve tried before.
I wonder if something like this could somehow be combined with Anki to generate randomized questions? One of the issues I’ve had when using spaced repetition for learning coding is that I often end up remembering the exact answer to questions, when really what I want to do is learn when and where to use tools to solve varied problems. I wonder if using LLMs to randomize the questions could mitigate that a bit?

Martin Fell Oct 12, 2022, 8:03 PM
4 points
0
in reply to: Cervera’s comment on: Actually, All Nuclear Famine Papers are Bunk
For what it’s worth, most modern fusion bombs actually generate most (e.g. 80%+) of their “yield” from fission—the fusion stage is surrounded by a layer of uranium which is bombarded by neutrons produced in the fusion reaction, causing fission in the uranium and magnifying the yield. So they are pretty dirty weapons. They are at least smaller than the weapons from the 50s and 60s though.

Keyboard shortcuts

Keys shown in yellow (e.g., ]) are accesskeys, and require a browser-specific modifier key (or keys).

Keys shown in grey (e.g., ?) do not require any modifier keys.

General
? Show keyboard shortcuts
Esc Hide keyboard shortcuts

Site navigation
h Go to Home (a.k.a. “Frontpage”) view
f Go to Featured (a.k.a. “Curated”) view
a Go to All (a.k.a. “Community”) view
m Go to Meta view
v Go to Tags view
c Go to Recent Comments view
r Go to Archive view
q Go to Sequences view
t Go to About page
u Go to User or Login page
o Go to Inbox page

Page navigation
, Jump up to top of page
. Jump down to bottom of page
/ Jump to top of comments section
s Search

Page actions
n New post or comment
e Edit current post

Post/comment list views
. Focus next entry in list
, Focus previous entry in list
; Cycle between links in focused entry
Enter Go to currently focused entry
Esc Unfocus currently focused entry
] Go to next page
[ Go to previous page
\ Go to first page
e Edit currently focused post

Editor
k Bold text
i Italic text
l Insert hyperlink
q Blockquote text

Appearance
= Increase text size
- Decrease text size
0 Reset to default text size
′ Cycle through content width settings
1 Switch to default theme [A]
2 Switch to dark theme [B]
3 Switch to grey theme [C]
4 Switch to ultramodern theme [D]
5 Switch to simple theme [E]
6 Switch to brutalist theme [F]
7 Switch to ReadTheSequences theme [G]
8 Switch to classic Less Wrong theme [H]
9 Switch to modern Less Wrong theme [I]
; Open theme tweaker
Enter Save changes and close theme tweaker
Esc Close theme tweaker (without saving)

Slide shows
l Start/resume slideshow
Esc Exit slideshow
→↓ Next slide
←↑ Previous slide
Space Reset slide zoom

Miscellaneous
x Switch to next view on user page
z Switch to previous view on user page
` Toggle compact comment list view
g Toggle anti-kibitzer