vitaliya comments on SolidGoldMagikarp (plus, prompt generation)

vitaliya 5 Feb 2023 10:34 UTC
LW: 110 AF: 33
14
AF
I think I found the root of some of the poisoning of the dataset at this link. It contains TheNitromeFan, SolidGoldMagikarp, RandomRedditorWithNo, Smartstocks, and Adinida from the original post, as well as many other usernames which induce similar behaviours; for example, when ChatGPT is asked about davidjl123, either it terminates responses early or misinterprets the input in a similar way to the other prompts. I don’t think it’s a backend scraping thing, so much as scraping Github, which in turn contains all sorts of unusual data.
- Hoagy 6 Feb 2023 19:29 UTC
  49 points
  10
  Parent
  Good find! Just spelling out the actual source of the dataset contamination for others since the other comments weren’t clear to me:
  
  r/counting is a subreddit in which people ‘count to infinity by 1s’, and the leaderboard for this shows the number of times they’ve ‘counted’ in this subreddit. These users have made 10s to 100s of thousands of reddit comments of just a number. See threads like this:
  
  https://old.reddit.com/r/counting/comments/ghg79v/3723k_counting_thread/
  
  They’d be perfect candidates for exclusion from training data. I wonder how they’d feel to know they posted enough inane comments to cause bugs in LLMs.
  - gwern 6 Feb 2023 19:36 UTC
    22 points
    1
    Parent
    Skeptical, apparently.
    - Yitz 8 Feb 2023 6:57 UTC
      36 points
      8
      Parent
      
      that’s probably exactly what’s going on. The usernames were so frequent in the reddit comments dataset that the tokenizer, the part that breaks a paragraph up into word-ish-sized-chunks like ” test” or ” SolidGoldMagikarp” (the space is included in many tokens) so that the neural network doesn’t have to deal with each character, learned they were important words. But in a later stage of learning, comments without complex text were filtered out, resulting in your usernames getting their own words… but the neural network never seeing the words activate. It’s as if you had an extra eye facing the inside of your skull, and you’d never felt it activate, and then one day some researchers trying to understand your brain shined a bright light on your skin and the extra eye started sending you signals. Except, you’re a language model, so it’s more like each word is a separate finger, and you have tens of thousands of fingers, one on each word button. Uh, that got weird,
      
      This is an incredible analogy
      - Portia 4 Mar 2023 16:14 UTC
        1 point
        2
        Parent
        Once again, disturbed that humans writing nonsense on the internet is being fed to developing minds, which become understandably confused and buggy as a result. :( In the case of reddit here, at least it had meaning and function in context, but for a lot of human stuff online...
        It’s part of why I am so worried about recent attempts by e.g. Meta to make an LLM that is simply bigger, and hence less curated, by scraping anything they can find online for it. Can you all imagine how fucked up an AI would act if you feed it 4chan as a model for human communication? :( This is not on AI, it is on us feeding it our worst and most irrational sides. :(
        Lachlan Smith 14 Mar 2023 2:01 UTC
        1 point
        0
        Parent
        Can you all imagine how fucked up an AI would act if you feed it 4chan as a model for human communication?
        Imagine no longer
  - [ ]
    [deleted]
- John Simons 6 Feb 2023 15:22 UTC
  19 points
  0
  Parent
  What is quite interesting about that dataset is the fact it has strings in the form “*number|*weirdstring*|*number*” which I remember seeing in some methods of training LLMs, i.e. “|” being used as delimiter for tokens. They could be poisoned training examples or have some weird effect in retrieval.
- Aaron Adams 5 Feb 2023 18:55 UTC
  14 points
  2
  Parent
  This repository seems to contain the source code of a bot responsible for updating the “Hall of Counters” in the About section of the r/counting community on Reddit. I don’t participate in the community, but from what I can gather, this list seems to be a leaderboard for the community’s most active members. A number of these anomalous tokens still persist on the present-day version of the list.
  - vitaliya 5 Feb 2023 21:53 UTC
    53 points
    11
    Parent
    I did do a little research around that community before posting my comment; only later did I realise that I’d actually discovered a distinct failure mode to those in the original post: under some circumstances, ChatGPT interprets the usernames as numbers. In particular this could be due to the /r/counting subreddit being a place where people make many posts incrementing integers. So these username tokens, if encountered in a Reddit-derived dataset, might be being interpreted as numbers themselves, since they’d almost always be contextually surrounded by actual numbers.
- David Scott Krueger (formerly: capybaralet) 14 Feb 2023 9:23 UTC
  LW: 3 AF: 1
  2
  AF Parent
  FYI: my understanding is that “data poisoning” refers to deliberately the training data of somebody else’s model which I understand is not what you are describing.
  - vitaliya 15 Feb 2023 21:11 UTC
    1 point
    1
    Parent
    Sure—let’s say this is more like a poorly-labelled bottle of detergent that the model is ingesting under the impression that it’s cordial. A Tide Pod Challenge of unintended behaviours. Was just calling it “poisoning” as shorthand since the end result is the same, it’s kind of an accidental poisoning.