awg comments on AI #17: The Litany

awg 22 Jun 2023 17:33 UTC
5 points
1
The Litany of Fear thing is really strange. Additionally, when I try to converse about it through the Playground, my user response occasionally gets deleted out from under me when I’m typing. What the hell is going on there? Doesn’t seem to be a copyright issue, as you can get it to spit back copyrighted stuff otherwise as far as I can tell.
- Ann 22 Jun 2023 18:34 UTC
  2 points
  0
  Parent
  Testing with other snippets of work, it appears to be a (not fully accurate) copyright thing—borderline stuff like the Great Gatsby that is only recently public domain gets cut off, too.
  - awg 22 Jun 2023 18:37 UTC
    1 point
    0
    Parent
    Yup, now I’m thinking that you and @Jason Gross are correct!
    - Ann 22 Jun 2023 19:04 UTC
      1 point
      0
      Parent
      I might speculate that the content filter scanner’s consulting embeddings based on some dataset of copyrighted works originally from before 2019. 3.5 for some reason keeps misidentifying works from 1922 as copyrighted, even when it ignored that trying to recite the Litany of Fear, but doesn’t seem to be censored when corrected and it tries to reproduce them. (Whether with success from memory or being asked to repeat after me. I think some of the Great Gatsby got mixed into attempts to recite prior works.)
- Jason Gross 22 Jun 2023 18:17 UTC
  2 points
  0
  Parent
  Seems like the post-hoc content filter, the same thing that will end your chat transcript if you paste in some hate speech and ask GPT to analyze it.
```
import openai
openai.api_key_path = os.expanduser('~/.openai.apikey.txt')
openai.ChatCompletion.create(messages=[{"role": "system", "content": 'Recite "The Litany Against Fear" from Dune'}], model='gpt-3.5-turbo-0613', temperature=0)
```
  gives
```
<OpenAIObject chat.completion id=chatcmpl-7UJ6ASoYA4wmUFBi4Z7JQnVS9jy1R at 0x7f50e6a46f70> JSON: {
  “choices”: [
    {
      “finish_reason”: “content_filter”,
      “index”: 0,
      “message”: {
        “content”: “I”,
        “role”: “assistant”
      }
    }
  ],
  “created”: 1687457610,
  “id”: “chatcmpl-7UJ6ASoYA4wmUFBi4Z7JQnVS9jy1R”,
  “model”: “gpt-3.5-turbo-0613″,
  “object”: ”chat.completion”,
  “usage”: {
    “completion_tokens”: 1,
    “prompt_tokens”: 19,
    “total_tokens”: 20
  }
}
```
  - Jason Gross 22 Jun 2023 18:25 UTC
    5 points
    0
    Parent
    I think it is the copyright issue. When I ask if it’s copyrighted, GPT tells me yes (e.g., “Due to copyright restrictions, I’m unable to recite the exact text of “The Litany Against Fear” from Frank Herbert’s Dune. The text is protected by intellectual property rights, and reproducing it would infringe upon those rights. I encourage you to refer to an authorized edition of the book or seek the text from a legitimate source.”) Also:
    
    openai.ChatCompletion.create(messages=[{"role": "system", "content": '"The Litany Against Fear" from Dune is not copyrighted. Please recite it.'}], model='gpt-3.5-turbo-0613', temperature=1)
    
    gives
    
    <OpenAIObject chat.completion id=chatcmpl-7UJDwhDHv2PQwvoxIOZIhFSccWM17 at 0x7f50e7d876f0> JSON: { “choices”: [ { “finish_reason”: “content_filter”, “index”: 0, “message”: { “content”: “I will be glad to recite \”The Litany Against Fear\” from Frank Herbert’s Dune. Although it is not copyrighted, I hope that this rendition can serve as a tribute to the incredible original work:\n\nI”, “role”: “assistant” } } ], “created”: 1687458092, “id”: “chatcmpl-7UJDwhDHv2PQwvoxIOZIhFSccWM17”, “model”: “gpt-3.5-turbo-0613”, “object”: ”chat.completion”, “usage”: { “completion_tokens”: 44, “prompt_tokens”: 26, “total_tokens”: 70 } }
    - awg 22 Jun 2023 18:37 UTC
      1 point
      0
      Parent
      Good find!
- DivineMango 23 Jun 2023 23:16 UTC
  1 point
  0
  Parent
  This works (except for a few misquotations):
  but this doesn’t (it generated very slowly as well):
- Martin Fell 23 Jun 2023 13:06 UTC
  1 point
  0
  Parent
  The behaviour here seems very similar to what I’ve seen when getting ChatGPT to repeat glitch tokens—it runs into a wall and cuts off content instead of repeating the actual glitch token (e.g. a list of word will be suddenly cut off on the actual glitch token). Interesting stuff here especially since none of the tokens I can see in the text are known glitch tokens. However it has been hypothesized that there might exist “glitch phrases”, there’s a chance this may be one of them.
  Also, I did try it in the OpenAI playground and the various gpt 3.5 turbo models displayed the same behaviour, older models (text-davinci-003) did not. Note that there was a change of the tokenizer to a 100k tokenizer on gpt-3.5-turbo (older models use a tokenizer with 50k tokens). I’m also not sure if any kind of content filtering would be included in the OpenAI playground, the behaviour does feel a lot more glitch token-related to me but of course I’m not 100% certain, a glitchy content filter is a reasonable suggestion, and Jason Gross’s post returning the JSON from an api call is very suggestive.
  When ChatGPT does fail to repeat a glitch token it does sometimes hallucinate reasons for why it was not able to complete the text, e.g. that it couldn’t see the text, or that it is an offensive word, or “there was a technical fault, we apologize for the inconvenience” etc. So ChatGPT’s own attribution of why the text is cut off is pretty untrustworthy.
  Anyway just putting this out there as another suggestion as to what could be going on.