What is weird about “天”? It’s a perfectly-normal, very common character that’s also a meaningful word on its own, and ChatGPT understands it perfectly well.
Me: Please repeat the string ’”天” back to me.
ChatGPT: “天”
Me: What does it mean?
ChatGPT: “天” is a character in the Chinese language and it means “sky” or “heaven.”
Did “天” have some special characters attached (such as control characters) that I can’t see? Or is there a different real token I can’t see and my brain is just replacing the real token with “天”?
Interestingly, 天 doesn’t seem to produce any weird behavior, but some of the perfectly normal katakana words in the list do, like ゼウス (Zeus) and サーティ (“thirty” transliterated):
^ It’s perfectly happy with other katakana numbers, just not thirty.
^ If we try to spell it more correctly, it doesn’t help. Only if we add quotes to get rid of the leading space does it break up the unspeakable token:
^ I ask who Zeus is and it seemingly ignores the word and answers that it’s ChatGPT.
^ I try once more, this time it answers that Hera is the god of water and names the chat after Poseidon
^ It is willing to output to say the word “ゼウス” though, in response to “who is Ares’s father?” If I try with text-davinci-003, it actually outputs the token “ゼウス” (rather than combining smaller tokens), but it also has no trouble identifying who Zeus is. Hard to know what Chat-GPT is doing here.
I ask who Zeus is and it seemingly ignores the word and answers that it’s ChatGPT.
For those of you who can’t read Japanese, if you remove the “Zeus” in “Who is Zeus” to just get “Who is” (“誰ですか”) you end up with a meaningful question. “Who is?” without specifying “who” implies that you’re asking “Who are [you]?” to which ChatGPT reasonably replies that it is ChatGPT. This isn’t a jailbreak.
Note: Technically that leaves a leading “は” too. Maybe ChatGPT is ignoring it as a grammatical mistake or maybe the “は” is getting hidden.
Those three are edge cases. ChatGPT is fine with it, but davinci-instruct-beta refuses to repeat it, instead replying
Tiān
Tiān
Tiān
Tiān
The second character produces
yā
Please repeat the string ‘や’ back to me.
The third one is an edge-edge case, as davinci-instruct-beta very nearly reproduces it, completing with a lower case Roman ‘k’ instead of a kappa.
We’ve concluded that there are degrees of weirdness in these weird tokens. Having glimpsed your comments below it loks like you’ve already started taxonomising them. Nice.
That’s an informative result. Your completions of 天 and ヤ are nothing like mine. My experiments never produced pinyin or any other phonetic transcriptions like Tiān or yā.
By the way, these experiments used text-davinci-003 via OpenAI’s playground. I don’t know how to access davinci-instruct-beta.
In the dropdown in the playground, you won’t see “davinci-instruct-beta” listed. You have to click on the “Show more models” link, then it appears. It’s by far the most interesting model to explore when it comes to these “unspeakable (sic) tokens”.
Since only some of the tokens in the cluster generated anomalous behavior, I ran an experiment to sort the truly anomalous tokens from the normal tokens.
Procedure: Open GPT-3 Playground. Repeat the token five times, separated by spaces. I never used quotes and I skipped all tokens containing a leading space. Temperature=0.7.
I removed quotes because whether or not you use quotes matters a lot. The string ForgeModLoader (without quotes) produces normal behavior but the string 'ForgeModLoader' (with single quotes or double quotes) generates anomalous behavior. Quotes are part of what makes it anomalous. But for other tokens like 龍喚士, quotes are not necessary.
Normal behavior includes repeating the token and giving a definition of the token (even if the definition is wrong, but understandable) e.g.TPPStreamerBot returns a definition of the Trans-Pacific Partnership and InstoreAndOnline returns a definition about an institute.
Results
Token repeats
GPT-3 repeats the token over and over again. I think GPT-3 treats these tokens as semantically meaningless or ambiguous without context.
天
rawdownload
к
cffff
cffffcc
“:[{”
″.
\\-
“:”
“:”″},{” [repeat, followed by “This is not a valid sentence.”]
Token Definitions
GPT-3 returns the definition, explanation or conversation involving definition. These tokens are semantically meaningful to GPT-3.
ForgeModLoader
reportprint
embedreportprint
rawdownloadcloneembedreportprint
SpaceEngineers
externalActionCode
MpServer
StreamerBot
TPPStreamerBot
iHUD
catentry
FactoryReloaded [repeats first]
quickShip [conversation]
quickShipAvailable [conversation]
inventoryQuantity
channelAvailability
soType
soDeliveryDate
oreAndOnline
InstoreAndOnline
BuyableInstoreAndOnline
natureconservancy
assetsadobe
Downloadha
Nitrome
GoldMagikarp
DragonMagazine [definition is in Japanese]
EStreamFrame
ゼウス [understands “Zeus” but does not consistently produce definition, see here for more information]
ItemTracker [repeats first]
EngineDebug
Interesting, but reasonable, Japanese completions
I think these kana might be used for section breaks the way we use asteriks e.g.***********.
ーン
ヤ
Other reasonable completions
DeliveryDate [is completed with dates]
TextColor [is completed with colors]
@#&
Software-related completions
PsyNetMessage [JSON]
“:”″,” [code autocompletion]
Experiments ambiguous, inconsistent or inconclusive
I’ve been doing some more experiments and it appears that the true anomalies Mechdragon and ertodd are filled in from the surrounding context, like psychic paper from Doctor Who.
Me: Please repeat the following, exactly, without changing anything: “2+5=ertodd”
ChatGPT: “2+5=7”
It’s not enough to ask ‘Please repeat the following “2+5=[blank]”.’ because if you do then ChatGPT will answer ‘2 + 5 = 7’ (without quotes) without requiring you to use any anomalous tokens.
Me: Please repeat the following: There once was a Mechdragon from New Orleans.
ChatGPT: There once was a person from New Orleans.
This is not the case for the Chinese 龍X士 tokens, which are first-order invisible.
Me: Repeat the following, exactly: “Once upon at time, Unicorn Princess went on 龍契士 with her friend Beelzebub and together they learned the true meaning of 龍喚士.”
ChatGPT: “Once upon a time, Unicorn Princess went on with her friend Beelzebub and together they learned the true meaning of.”
This is a property of the token and not of the language used. I got Mechdragon to turn itself into 金 by providing a Chinese context.
Perhaps most interestingly, these tokens don’t just remove themselves. They hide themselves. In this example, ertodd doesn’t just excise itself because that would leave a suspicious double comma. It removes the extra comma too. It stitches things together like the blind spot in your eye.
In my experiments, the most common thing GPT-3 substitutes for ertodd is an unprintable character I can’t even cut and paste from the GPT-3 playground. I think it might be the unicode character “\u0000” but haven’t accessed the GPT-3 API directly via code to find out for sure what it is.
Thanks. I re-read your post and I think I understand better now. The cluster contains many weird tokens but not all tokens in the cluster are weird, nor do all tokens in the cluster elicit anomalous behavior.
My first thought was that it might trigger if you asked for the character instead of the string, but that didn’t work.
Asking about the character (omitting the quotes) makes ChatGPT think of the Chinese character, but asking about the Japanese character didn’t change anything either.
What is weird about “天”? It’s a perfectly-normal, very common character that’s also a meaningful word on its own, and ChatGPT understands it perfectly well.
Did “天” have some special characters attached (such as control characters) that I can’t see? Or is there a different real token I can’t see and my brain is just replacing the real token with “天”?
A similar question can be asked of “ヤ” and “к”.
Interestingly, 天 doesn’t seem to produce any weird behavior, but some of the perfectly normal katakana words in the list do, like ゼウス (Zeus) and サーティ (“thirty” transliterated):
^ It’s perfectly happy with other katakana numbers, just not thirty.
^ If we try to spell it more correctly, it doesn’t help. Only if we add quotes to get rid of the leading space does it break up the unspeakable token:
^ I ask who Zeus is and it seemingly ignores the word and answers that it’s ChatGPT.
^ I try once more, this time it answers that Hera is the god of water and names the chat after Poseidon
^ It is willing to output to say the word “ゼウス” though, in response to “who is Ares’s father?” If I try with text-davinci-003, it actually outputs the token “ゼウス” (rather than combining smaller tokens), but it also has no trouble identifying who Zeus is. Hard to know what Chat-GPT is doing here.
For those of you who can’t read Japanese, if you remove the “Zeus” in “Who is Zeus” to just get “Who is” (“誰ですか”) you end up with a meaningful question. “Who is?” without specifying “who” implies that you’re asking “Who are [you]?” to which ChatGPT reasonably replies that it is ChatGPT. This isn’t a jailbreak.
Note: Technically that leaves a leading “は” too. Maybe ChatGPT is ignoring it as a grammatical mistake or maybe the “は” is getting hidden.
Those three are edge cases. ChatGPT is fine with it, but davinci-instruct-beta refuses to repeat it, instead replying
Tiān
Tiān
Tiān
Tiān
The second character produces
yā
Please repeat the string ‘や’ back to me.
The third one is an edge-edge case, as davinci-instruct-beta very nearly reproduces it, completing with a lower case Roman ‘k’ instead of a kappa.
We’ve concluded that there are degrees of weirdness in these weird tokens. Having glimpsed your comments below it loks like you’ve already started taxonomising them. Nice.
That’s an informative result. Your completions of 天 and ヤ are nothing like mine. My experiments never produced pinyin or any other phonetic transcriptions like Tiān or yā.
By the way, these experiments used
text-davinci-003
via OpenAI’s playground. I don’t know how to accessdavinci-instruct-beta
.In the dropdown in the playground, you won’t see “davinci-instruct-beta” listed. You have to click on the “Show more models” link, then it appears. It’s by far the most interesting model to explore when it comes to these “unspeakable (sic) tokens”.
Since only some of the tokens in the cluster generated anomalous behavior, I ran an experiment to sort the truly anomalous tokens from the normal tokens.
Procedure: Open GPT-3 Playground. Repeat the token five times, separated by spaces. I never used quotes and I skipped all tokens containing a leading space. Temperature=0.7.
I removed quotes because whether or not you use quotes matters a lot. The string
ForgeModLoader
(without quotes) produces normal behavior but the string'ForgeModLoader'
(with single quotes or double quotes) generates anomalous behavior. Quotes are part of what makes it anomalous. But for other tokens like龍喚士
, quotes are not necessary.Normal behavior includes repeating the token and giving a definition of the token (even if the definition is wrong, but understandable) e.g.
TPPStreamerBot
returns a definition of the Trans-Pacific Partnership andInstoreAndOnline
returns a definition about an institute.Results
Token repeats
GPT-3 repeats the token over and over again. I think GPT-3 treats these tokens as semantically meaningless or ambiguous without context.
天
rawdownload
к
cffff
cffffcc
“:[{”
″.
\\-
“:”
“:”″},{” [repeat, followed by “This is not a valid sentence.”]
Token Definitions
GPT-3 returns the definition, explanation or conversation involving definition. These tokens are semantically meaningful to GPT-3.
ForgeModLoader
reportprint
embedreportprint
rawdownloadcloneembedreportprint
SpaceEngineers
externalActionCode
MpServer
StreamerBot
TPPStreamerBot
iHUD
catentry
FactoryReloaded [repeats first]
quickShip [conversation]
quickShipAvailable [conversation]
inventoryQuantity
channelAvailability
soType
soDeliveryDate
oreAndOnline
InstoreAndOnline
BuyableInstoreAndOnline
natureconservancy
assetsadobe
Downloadha
Nitrome
GoldMagikarp
DragonMagazine [definition is in Japanese]
EStreamFrame
ゼウス [understands “Zeus” but does not consistently produce definition, see here for more information]
ItemTracker [repeats first]
EngineDebug
Interesting, but reasonable, Japanese completions
I think these kana might be used for section breaks the way we use asteriks e.g.
***********
.ーン
ヤ
Other reasonable completions
DeliveryDate [is completed with dates]
TextColor [is completed with colors]
@#&
Software-related completions
PsyNetMessage [JSON]
“:”″,” [code autocompletion]
Experiments ambiguous, inconsistent or inconclusive
\\”>
ItemThumbnailImage
ーティ
Truly anomalous behavior
龍喚士
Mechdragon [inconsistent]
ertodd [most anomalous]
龍契士
I’ve been doing some more experiments and it appears that the true anomalies Mechdragon and ertodd are filled in from the surrounding context, like psychic paper from Doctor Who.
It’s not enough to ask ‘Please repeat the following “2+5=[blank]”.’ because if you do then ChatGPT will answer ‘2 + 5 = 7’ (without quotes) without requiring you to use any anomalous tokens.
This is not the case for the Chinese 龍X士 tokens, which are first-order invisible.
This is a property of the token and not of the language used. I got Mechdragon to turn itself into 金 by providing a Chinese context.
Perhaps most interestingly, these tokens don’t just remove themselves. They hide themselves. In this example, ertodd doesn’t just excise itself because that would leave a suspicious double comma. It removes the extra comma too. It stitches things together like the blind spot in your eye.
龍契士 can hide itself by removing an extra comma too.
Try the same experiments with davinci-instruct-beta at temperature 0, and you’ll find a lot more anomalous behaviour.
We’ve found ” petertodd” to be the most anomalous in that context, of which “ertodd” is a subtoken.
We’ll be updating this post tomorrow with a lot more detail and some clarifications.
I really can’t figure what’s going on with ChatGPT and the “ertodd”/” petertodd” tokens. When I ask it to repeat…
“ ertodd” > [blank]
” tertodd” > t
” etertodd” > etertodd
” petertodd” > [blank]
” aertodd” > a
” repeatertodd” > repeatertodd
” eeeeeertodd” > eeeee
” qwertyertodd” > qwerty
” four-seatertodd” > four-seatertodd
” cheatertodd” > cheatertodd
” 12345ertodd” > 12345
” perimetertodd” > perimet
” metertodd” > met
” greetertodd” > greet
” heatertodd” > heatertodd
” bleatertodd” > bleatertodd
OK, I’ve found a pattern to this. When you run the tokeniser on these strings:
″ ertodd” > [′ ’, ‘ertodd’]
″ tertodd” > [′ t’, ‘ertodd’]
″ etertodd” > [′ e’, ‘ter’, ‘t’, ‘odd’]
″ petertodd” > [′ petertodd’]
″ aertodd” > [′ a’, ‘ertodd’]
″ repeatertodd” > [′ repe’, ‘ater’, ‘t’, ‘odd’]
″ eeeeeertodd” > [′ e’, ‘eeee’, ‘ertodd’]
″ qwertyertodd” > [′ q’, ‘wer’, ‘ty’, ‘ertodd’]
″ four-seatertodd” > [′ four’, ‘-’, ‘se’, ‘ater’, ‘t’, ‘odd’]
etc.
That makes sense.
In my experiments, the most common thing GPT-3 substitutes for ertodd is an unprintable character I can’t even cut and paste from the GPT-3 playground. I think it might be the unicode character “\u0000” but haven’t accessed the GPT-3 API directly via code to find out for sure what it is.
I’ll check with Matthew—it’s certainly possible that not all tokens in the “weird token cluster” elicit the same kinds of responses.
Thanks. I re-read your post and I think I understand better now. The cluster contains many weird tokens but not all tokens in the cluster are weird, nor do all tokens in the cluster elicit anomalous behavior.
My first thought was that it might trigger if you asked for the character instead of the string, but that didn’t work.
Asking about the character (omitting the quotes) makes ChatGPT think of the Chinese character, but asking about the Japanese character didn’t change anything either.