[Question] Why does ChatGPT throw an error when outputting “David Mayer”?

Archimedes1 Dec 2024 0:11 UTC

6 points

9 comments1 min readLW link

Censorship OpenAI GPT AI

This oddity is making the rounds on Reddit, Twitter, Hackernews, etc.

Is OpenAI censoring references to one of these people? If so, why?

https://en.m.wikipedia.org/wiki/David_Mayer_de_Rothschild https://en.wikipedia.org/wiki/David_Mayer_(historian)

Edit: More names have been found that behave similarly:

Brian Hood
Jonathan Turley
Jonathan Zittrain
David Faber
David Mayer
Guido Scorza

Source: https://www.reddit.com/r/ChatGPT/comments/1h420u5/unfolding_chatgpts_mysterious_censorship_and/

Update: “David Mayer” no longer breaks ChatGPT but the other names are still problematic.

Archimedes1 Dec 2024 0:11 UTC

6 points

9 comments1 min readLW link

Censorship OpenAI GPT AI

Steven Byrnes 2 Dec 2024 19:44 UTC
4 points
0
There’s a theory (twitter citing reddit) that at least one of these people filed GDPR right to be forgotten requests. So one hypothesis would be: all of those people filed such GDPR requests.
But the reddit post (as of right now) guesses that it might not be specifically about GDPR requests per se, but rather more generally “It’s a last resort fallback for preventing misinformation in situations where a significant threat of legal action is present”.
- gwern 3 Dec 2024 21:00 UTC
  14 points
  4
  Parent
  OA has indirectly confirmed it is a right-to-be-forgotten thing in https://www.theguardian.com/technology/2024/dec/03/chatgpts-refusal-to-acknowledge-david-mayer-down-to-glitch-says-openai
  
  ChatGPT’s developer, OpenAI, has provided some clarity on the situation by stating that the Mayer issue was due to a system glitch. “One of our tools mistakenly flagged this name and prevented it from appearing in responses, which it shouldn’t have. We’re working on a fix,” said an OpenAI spokesperson
  
  ...OpenAI’s Europe privacy policy makes clear that users can delete their personal data from its products, in a process also known as the “right to be forgotten”, where someone removes personal information from the internet.
  
  OpenAI declined to comment on whether the “Mayer” glitch was related to a right to be forgotten procedure.
  
  Good example of the redactor’s dilemma and the need for Glomarizing: by confirming that they have a tool to flag names and hide them, and then by neither confirming or denying that this was related to a right-to-be-forgotten order (a meta-gag), they confirm that it’s a right-to-be-forgotten bug.
  
  Similar to when OA people were refusing to confirm or deny signing OA NDAs which forbade them from discussing whether they had signed an OA NDA… That was all the evidence you needed to know that there was a meta-gag order (as was eventually confirmed more directly).
- Archimedes 2 Dec 2024 23:39 UTC
  6 points
  0
  Parent
  I don’t think it’s necessarily GDPR-related but the names Brian Hood and Jonathan Turley make sense from a legal liability perspective. According to info via ArsTechnica,
  
  Why these names?
  
  We first discovered that ChatGPT choked on the name “Brian Hood” in mid-2023 while writing about his defamation lawsuit. In that lawsuit, the Australian mayor threatened to sue OpenAI after discovering ChatGPT falsely claimed he had been imprisoned for bribery when, in fact, he was a whistleblower who had exposed corporate misconduct.
  
  The case was ultimately resolved in April 2023 when OpenAI agreed to filter out the false statements within Hood’s 28-day ultimatum. That is possibly when the first ChatGPT hard-coded name filter appeared.
  
  As for Jonathan Turley, a George Washington University Law School professor and Fox News contributor, 404 Media notes that he wrote about ChatGPT’s earlier mishandling of his name in April 2023. The model had fabricated false claims about him, including a non-existent sexual harassment scandal that cited a Washington Post article that never existed. Turley told 404 Media he has not filed lawsuits against OpenAI and said the company never contacted him about the issue.
  
  Interestingly, Jonathan Zittrain is on record saying the Right to be Forgotten is a “bad solution to a real problem” because “the incentives are clearly lopsided [towards removal]”.
  
  User throwayian on Hacker News ponders an interesting abuse of this sort of censorship:
  
  I wonder if you could change your name to “April May” and submitted CCPA/GDPR what the result would be..
Nate Showell 1 Dec 2024 20:44 UTC
1 point
−7
This looks like it’s related to the phenomenon of glitch tokens:
https://www.lesswrong.com/posts/8viQEp8KBg2QSW4Yc/solidgoldmagikarp-iii-glitch-token-archaeology
https://www.lesswrong.com/posts/f4vmcJo226LP7ggmr/glitch-token-catalog-almost-a-full-clear
ChatGPT no longer uses the same tokenizer that it used when the SolidGoldMagikarp phenomenon was discovered, but its new tokenizer could be exhibiting similar behavior.
- Archimedes 1 Dec 2024 22:05 UTC
  1 point
  0
  Parent
  It’s not a classic glitch token. Those did not cause the current “I’m unable to produce a response” error that “David Mayer” does.
  - gwern 1 Dec 2024 23:53 UTC
    9 points
    6
    Parent
    It would also be odd as a glitch token. These are space-separated names, so most tokenizers will tokenize them separately, and glitch tokens appear to be due to undertraining but how could that possibly be the case for a phrase like “David Mayer” which has so many instances across the Internet which have no apparent reason to be filtered out by data-curation processes the way the glitch tokens often do?
Pazzaz 1 Dec 2024 12:50 UTC
1 point
−1
Probably because of a terrorist who used the alias David Mayer.
- notfnofn 1 Dec 2024 14:24 UTC
  3 points
  0
  Parent
  I don’t think this explanation makes sense. I asked ChatGPT “Can you tell me things about Akhmed Chatayev”, and it had no problem using his actual name over and over. I asked about his aliases and it said
  
  Akhmed Chatayev, a Chechen Islamist and leader within the Islamic State (IS), was known to use several aliases throughout his militant activities. One of his primary aliases was “Akhmed Shishani,” with “Shishani” translating to “Chechen,” indicating his ethnic origin. Wikipedia
  
  Additionally, Chatayev adopted the alias “David
  
  Then threw an error message. Edit: upon refresh it said more:
  
  Akhmed Chatayev, a Chechen Islamist and leader within the Islamic State (IS), was known to use several aliases throughout his militant activities. One of his primary aliases was “Akhmed Shishani,” with “Shishani” translating to “Chechen,” indicating his ethnic origin. Wikipedia
  
  Additionally, Chatayev adopted the alias “David Mayer.” This particular alias led to a notable case of mistaken identity involving a 90-year-old U.S. Army veteran and theater historian named David Mayer. The veteran experienced significant disruptions, such as difficulties in traveling and receiving mail, due to his name being on a U.S. security list associated with Chatayev’s alias. CBC
  
  These aliases facilitated Chatayev
  
  (I didn’t stop copying there; that was the end of the answer. Full chat)
  - Viliam 2 Dec 2024 8:42 UTC
    4 points
    0
    Parent
    Maybe ChatGPT is recently more likely to stop mid-sentence.
    Something like that happened to me recently on a completely different topic (I wanted to find an author of a poem based on a few lines I remembered), and the first answer just stopped in the middle; then I clicked refresh and received a full answer (factually wrong though). Can’t link the chat because I have already deleted it.

No comments.