GPT-3.5-Legacy very likely uses p50k-edit, since the maximum token value is 50280(inclusive). During my tests, sometimes the responses are not very “glichty”, but the generated title is. Probably worth further investigation. I have been thinking, the abrupt termination of generation when trying to say the “unspeakable” tokens may be a result of the possibilities of the glitch token and its neighbors being too low, which causes things like <|im_end|> or <|endoftext|> to be evetually spit out. If we can try to suppress its intention to end the generation maybe we won’t have “unspeakable” tokens anymore.
anon3242
Karma: 1
ChatGPT filters out any text that resembles <|blahblah|> inside user prompt. Also the <|im_start|>,<|im_sep|>, and <|im_end|> tokens are completely out of user’s control. It’s simply impossible for us ChatGPT users to arbitrarily inject them.