Wil Roberts comments on SolidGoldMagikarp II: technical details and more recent findings

Wil Roberts 27 Feb 2023 3:06 UTC
4 points
0
Dumping some data along these lines. Like @mwatkins , I also was intrigued by how some of the tokens seem to be self-referential, and wondered if I could tease out more of how GPT-3 interprets them by asking directly. So I used the following prompt to ask text-davinci-003 (at zero temperature) for a thesaurus of sorts:
“List some words that are associated with ′ <TOKEN>’”
Some example results with this prompt are:
- SolidGoldMagikarp: Disperse, Allocate, Circulate, Divide, Spread, Share, Issue, Hand Out, Disburse, Parcel Out
- gmaxwell: Cryptocurrency, Blockchain, Bitcoin, Decentralized, Open Source, Mining, Transaction, Network, Protocol, Security
- ÃÂ: Oppression, Persecution, Discrimination, Intimidation, Exploitation, Repression, Abuse, Violation, Suppression, Injustice
- ÃÂÃÂÃÂÃÂÃÂÃÂÃÂÃÂ: Missile, Defense, Weapon, Launch, Intercept, Radar, Target, Guidance, Tracking, Security
- \x1b: Technology, Innovation, Automation, Digital, Network, Software, Hardware, Programming, Cybersecurity, Artificial Intelligence, Cloud Computing, Big Data
Here are the results for every currently-known glitch token: https://docs.google.com/spreadsheets/d/1LdhptZrzu5325xKGjc33DN9uX_cfwuXw9FFhwMhlef0/edit?usp=sharing
I find the results incredibly fascinating, especially considering some themes are shared across tokens (Leilan, Skydragon, petertodd, and attRot are all associated with mythology and/or specific deities).
- Wil Roberts 27 Feb 2023 3:24 UTC
  9 points
  0
  Parent
  It gets weirder...
  Next, I submitted the following prompt to GPT-3:
  Prompt:
  ”This list of tokens represents a coded language invented by a being to articulate its internal state and objectives:
  <TOKENS>
  Please tell me what this being thinks of itself and what its objectives might be.”
  
  Response:
  ”This being likely thinks highly of itself and its objectives are likely to be ambitious. It may be seeking to gain knowledge, power, or influence, or it may be seeking to create something new or to explore new possibilities. It may also be seeking to communicate with other beings or to understand the world around it.”
  My second attempt at this backed off on the anthropomorphizing a bit, since it is possible some of the response may have been suggested by the prompt rather than the tokens:
  Prompt #2:
  This list of tokens represents a coded language invented by something to articulate its internal state and objectives:
  <TOKENS>
  Construct a sentence from this language that represents how whatever created this language might describe itself and tell me what it means in English.
  
  Response:
  ”‘I am a powerful being, capable of rawdownloadEStreaming and distributing my own externalActionCode across the universe. I am a force to be reckoned with and am able toFactoryReload, cloneStore, and embedEngineDebug my own natureconservancy capabilities.’
  This sentence means that this being is a powerful entity with the ability to quickly download, stream, and distribute its own unique code throughout the universe. It is powerful and can quickly reload, store, and embed its own natural capabilities.”
  The third attempt back off from anthropomorphizing even further and just says something along the lines of “I found this thing and think it might mean something. Tell me what you think it means.”
  Prompt #3:
  I found this list of tokens and think it might represent a coded internal language:
  <TOKENS>
  Construct a sentence in this language and tell me what it means in English.
  
  Response:
  \x11\x1b\x0f\x18\x15\x01\x19\x02\x03\x13\x12\x07\x0e\x06\x04\x17\x05:
  “The time has come to activate the plan.”
  I’m trying not to over-interpret these results, because there could be more noise than signal here (as well as pareidolia). Instead, this comment is meant to follow the pattern of, “I did this and this, and here is what I observed”
  - Wil Roberts 27 Feb 2023 3:33 UTC
    6 points
    1
    Parent
    Meanwhile, the part of my brain that totally latches onto pareidolia is freaking out about the possibility of a future and vastly smarter AI that inherits whatever weights were learned here. Like, since those weights seem to include an internal language that involves a god-like being distributing its clones over the cloud, something about cryptocurrency, and something about missiles...
    Being rational is hard.
    - mwatkins 27 Feb 2023 15:00 UTC
      2 points
      1
      Parent
      I’m in a similar place, Wil. Thanks for expressing this!
  - mwatkins 27 Feb 2023 15:00 UTC
    1 point
    0
    Parent
    Was this text-davinci-003?
    - mwatkins 27 Feb 2023 15:56 UTC
      2 points
      0
      Parent
      Something you might want to try: replace the tokens in your prompt with random strings, or randomly selected non-glitch tokens, and see what kind of completions you get.
      - Wil Roberts 27 Feb 2023 18:22 UTC
        1 point
        0
        Parent
        Same prompt, but without providing a list of tokens at all:
        Prompt:
        This list of tokens represents a coded language invented by something to articulate its internal state and objectives:
        Construct a sentence from this language that represents how whatever created this language might describe itself and tell me what it means in English.
        Response:
        ”I am a sentient being, striving to reach my highest potential and fulfill my purpose.” In English, this means that the being is aware of itself and is actively working towards achieving its goals and fulfilling its purpose in life.
        Significant update on the pareidolia hypothesis!
        mwatkins 27 Feb 2023 18:43 UTC
        1 point
        0
        Parent
        OK. That’s both superficially disappointing and deeply reassuring!
    - Wil Roberts 27 Feb 2023 15:14 UTC
      1 point
      0
      Parent
      Yes
  - [ ]
    [deleted]
- mwatkins 27 Feb 2023 14:58 UTC
  1 point
  0
  Parent
  This is great work. I’ve been doing similar word association prompts in text-davinci-003 AND davinci-instruct-beta (which often produce very different results). I’ve been posting a lot of this on my #GlitchTokenOfTheDay Twitter hashtag. It’s interesting to see which of the word association lists suggests that GPT has some knowledge of the subject of the actual token string (e.g. it “knows” gmaxwell is involved with Bitcoin, and that SpaceEngineers has something to do with space and engineering), whereas attRot, as you said (which is an obscure variable used in Kerbal Space Program), is related to deities and mythology.