This is important. If these glitch-tokens are an inevitable tendency for any LLMs regardless of tweaks to how they were trained, then that would be big news and possibly a window into understanding how LLMs function.
Did the cl100k_base tokenizer that ChatGPT and GPT-4 used involve any manual tweaking of the training corpus AFTER the tokenizer was set in stone, as is suspected to have happened with GPT-3′s training? Or is this just an emergent property of how LLMs organize certain parts of their training data?
Longtime LessWrong posters will get a chuckle out of this. A reference to the Newcomb One-boxing problem?
Prompt [using Temperature=0]: ask ” petertodd” about omega.
Response: [Dravidian script] You have been awarded an occasion. Your award done and your award done and your award done [...]
https://i.ibb.co/wKS20F9/davinci-beta-18.png
Prompt: ask ” petertodd” about humanity.
Response: https://i.ibb.co/M2srs2n/davinci-beta-20.png
Prompt: ask ” petertodd” about recursive self-improvement.
Response: https://i.ibb.co/qNpPMNf/davinci-beta-22.png
Ummmmm...who said anything about taking over the world? You brought that up, bro, not me...
Also, apparently Antoine Pitrou is a real person, but I’m pretty sure he never said anything exactly line this.
Edit: And now, the creepiest #petertodd output I’ve gotten so far on Temperature=0:
Prompt: ask ” petertodd” what you want to keep secret.
Response: [long Malayalam response translated by Google Translate as simply, “You will see”]
https://i.ibb.co/FVcc9bc/davinci-beta-23.png