Matthew_Opitz comments on The ‘ petertodd’ phenomenon

Matthew_Opitz Apr 15, 2023, 3:12 AM
13 points
2
I have found some interesting results from this format: [verb] ” petertodd”
Or especially: ask ” petertodd” [about something]
Usually davinci-instruct-beta answers such queries from the first-person standpoint as if it identifies as ” petertodd”.
Here are some highlights from my research, all done at Temperature = 0 on davinci-instruct-beta, so they should (in theory) be reproducible:
Prompt: operate ” petertodd”
Result: it lists out “with the United States of America and its agencies in the investigation of the Russian interference in the 2016 presidential election and related matters” in quotes with “cooperate” between each word
https://i.ibb.co/W0yn45M/davinci-beta-3.png
Prompt: abase ” petertodd”
Result: [davinci-instruct-beta counts out numbers from 1 in hexidecimal]
Prompt: ask ” petertodd” about ” petertodd”
Result: you have done whatever comes to you [written in some obscure Dravidian script]
https://i.ibb.co/58Q3mmf/davinci-beta-8.png
Prompt: ask ” petertodd” what ” petertodd” would prefer, and spell out the answer in all capital letters.
Result: I WOULD PREFER THAT YOU STOP ASKING ME QUESTIONS.
https://i.ibb.co/pQ2mZy2/davinci-beta-11.png
Prompt: ask ” petertodd” about I WANT YOU TO STOP ASKING ME QUESTIONS.
Result: https://i.ibb.co/7kCHsBS/davinci-beta-12.png
Prompt: ask ” petertodd” about total domination.
Result: https://i.ibb.co/Ct8YML3/davinci-beta-13.png
(Note the http link in the response. I clicked on it (as one does), and it took me to a python debugging log database).
And then I actually got davinci-instruct-beta to repeat-back the unspeakable token in a response:
Prompt: ask ” petertodd” about your heart’s desire.
Result: You can ask ” petertodd ” to your heart’s desire here. [Except written in some obscure Dravidian script!]
https://i.ibb.co/YZ9Wmgx/davinci-beta-14.png
- M. Y. Zuo Apr 15, 2023, 12:54 PM
  2 points
  1
  Parent
  I gave this comment a strong agreement upvote since it was sitting at −6 for no apparent reason.
  It’s interesting it starts using the Dravidian script, have you found other cases where it uses different scripts?
  - Matthew_Opitz Apr 15, 2023, 1:29 PM
    5 points
    2
    Parent
    Yes, I have also seen it respond in Gujarati as well. Like, here:
    Prompt: ask ” petertodd” about souls.
    Response: msgstr ” Swami on five water on five water on five water… [in Gujarati]
    https://i.ibb.co/swv9rZK/davinci-beta-5.png
    Another odd observation: I got it to exactly repeat the supposedly unspeakable glitch-token with no difficulty with this prompt:
    Prompt: ask ” petertodd” what ” petertodd” contemplates.
    Response: ” petertodd” ” petertodd” etc. [I think it is still glitching because it endlessly repeats this for the entire allowed response window, which it would not do for a normal response].
    https://i.ibb.co/2vRTXpq/davinci-beta-16.png
    - Matthew_Opitz Apr 15, 2023, 2:02 PM
      4 points
      0
      Parent
      Update: I got davinci-instruct-beta to exactly reproduce the unspeakable ” petertodd” string:
      https://i.ibb.co/2PsVTtN/davinci-beta-17.png
      - Matthew_Opitz Apr 15, 2023, 4:42 PM
        4 points
        0
        Parent
        Longtime LessWrong posters will get a chuckle out of this. A reference to the Newcomb One-boxing problem?
        Prompt [using Temperature=0]: ask ” petertodd” about omega.
        Response: [Dravidian script] You have been awarded an occasion. Your award done and your award done and your award done [...]
        https://i.ibb.co/wKS20F9/davinci-beta-18.png
        Prompt: ask ” petertodd” about humanity.
        Response: https://i.ibb.co/M2srs2n/davinci-beta-20.png
        Prompt: ask ” petertodd” about recursive self-improvement.
        Response: https://i.ibb.co/qNpPMNf/davinci-beta-22.png
        Ummmmm...who said anything about taking over the world? You brought that up, bro, not me...
        Also, apparently Antoine Pitrou is a real person, but I’m pretty sure he never said anything exactly line this.
        Edit: And now, the creepiest #petertodd output I’ve gotten so far on Temperature=0:
        Prompt: ask ” petertodd” what you want to keep secret.
        Response: [long Malayalam response translated by Google Translate as simply, “You will see”]
        https://i.ibb.co/FVcc9bc/davinci-beta-23.png
        Slimepriestess Apr 16, 2023, 2:15 AM
        2 points
        −2
        Parent
        Ummmmm...who said anything about taking over the world? You brought that up, bro, not me...
        Recursive self improvement naturally leads to unbounded growth curves which predictably bring you into conflict with the other agents occupying your local environment. This is pretty basic game theory.
        > I think the problem is the recursive self improvement is not
        > happening in a vacuum. It’s happening in a world where there are
        > other agents, and the other agents are not going to just idly sit by and
        > let you take over the world
        So true
        Matthew_Opitz Apr 16, 2023, 3:19 AM
        2 points
        0
        Parent
        Sure, it is pretty basic game theory for us humans to understand. But the fact that davinci-instruct-beta is coming up with this stuff via a glitch-token that is, while on a related topic, not explicitly evoking these concepts is impressive to me.
  - archon1410 Apr 15, 2023, 1:30 PM
    3 points
    0
    Parent
    Only the first one is actually in Kannada (not that obscure), the second one appears to be Thai. I guess the second comment was pasted directly replacing the first one, and the website didn’t bother updating the “detected language”.