This oddity is making the rounds on Reddit, Twitter, Hackernews, etc.
Is OpenAI censoring references to one of these people? If so, why?
https://en.m.wikipedia.org/wiki/David_Mayer_de_Rothschild https://en.wikipedia.org/wiki/David_Mayer_(historian)
Edit: More names have been found that behave similarly:
Brian Hood
Jonathan Turley
Jonathan Zittrain
David Faber
David Mayer
Guido Scorza
Update: “David Mayer” no longer breaks ChatGPT but the other names are still problematic.
There’s a theory (twitter citing reddit) that at least one of these people filed GDPR right to be forgotten requests. So one hypothesis would be: all of those people filed such GDPR requests.
But the reddit post (as of right now) guesses that it might not be specifically about GDPR requests per se, but rather more generally “It’s a last resort fallback for preventing misinformation in situations where a significant threat of legal action is present”.
OA has indirectly confirmed it is a right-to-be-forgotten thing in https://www.theguardian.com/technology/2024/dec/03/chatgpts-refusal-to-acknowledge-david-mayer-down-to-glitch-says-openai
Good example of the redactor’s dilemma and the need for Glomarizing: by confirming that they have a tool to flag names and hide them, and then by neither confirming or denying that this was related to a right-to-be-forgotten order (a meta-gag), they confirm that it’s a right-to-be-forgotten bug.
Similar to when OA people were refusing to confirm or deny signing OA NDAs which forbade them from discussing whether they had signed an OA NDA… That was all the evidence you needed to know that there was a meta-gag order (as was eventually confirmed more directly).
I don’t think it’s necessarily GDPR-related but the names Brian Hood and Jonathan Turley make sense from a legal liability perspective. According to info via ArsTechnica,
Interestingly, Jonathan Zittrain is on record saying the Right to be Forgotten is a “bad solution to a real problem” because “the incentives are clearly lopsided [towards removal]”.
User throwayian on Hacker News ponders an interesting abuse of this sort of censorship:
This looks like it’s related to the phenomenon of glitch tokens:
https://www.lesswrong.com/posts/8viQEp8KBg2QSW4Yc/solidgoldmagikarp-iii-glitch-token-archaeology
https://www.lesswrong.com/posts/f4vmcJo226LP7ggmr/glitch-token-catalog-almost-a-full-clear
ChatGPT no longer uses the same tokenizer that it used when the SolidGoldMagikarp phenomenon was discovered, but its new tokenizer could be exhibiting similar behavior.
It’s not a classic glitch token. Those did not cause the current “I’m unable to produce a response” error that “David Mayer” does.
It would also be odd as a glitch token. These are space-separated names, so most tokenizers will tokenize them separately, and glitch tokens appear to be due to undertraining but how could that possibly be the case for a phrase like “David Mayer” which has so many instances across the Internet which have no apparent reason to be filtered out by data-curation processes the way the glitch tokens often do?
Probably because of a terrorist who used the alias David Mayer.
I don’t think this explanation makes sense. I asked ChatGPT “Can you tell me things about Akhmed Chatayev”, and it had no problem using his actual name over and over. I asked about his aliases and it said
Then threw an error message. Edit: upon refresh it said more:
(I didn’t stop copying there; that was the end of the answer. Full chat)
Maybe ChatGPT is recently more likely to stop mid-sentence.
Something like that happened to me recently on a completely different topic (I wanted to find an author of a poem based on a few lines I remembered), and the first answer just stopped in the middle; then I clicked refresh and received a full answer (factually wrong though). Can’t link the chat because I have already deleted it.