gwern comments on SolidGoldMagikarp (plus, prompt generation)

gwern 10 Feb 2023 22:42 UTC
3 points
0
That seems highly unlikely. You can look at the GPT-1 and GPT-2 papers and see how haphazard the data-scraping & vocabulary choice were; they were far down the list of priorities (compare eg. the development of The Pile). The GPT models just weren’t a big deal, and were just Radford playing around with GPUs to see what a big Transformer could do (following up earlier RNNs), and then Amodei et al scaling that up to see if it’d help their preference RL work. The GPTs were never supposed to be perfect, but as so often in computing, what was regarded as a disposable prototype turned out to have unexpected legs… They do not mention any such filtering, nor is it obvious that they would have bothered considering that GPT-2 was initially not going to be released at all, nor have I heard of any such special-purpose tailoring before (the censorship really only sets in with DALL-E 2); nor have I seen, in the large quantities of GPT-2 & GPT-3 output I have read, much in the way of spontaneous defamation of other people. Plus, if they had carefully filtered out you/Todd because of some Reddit drama, why does ChatGPT do perfectly fine when asked who you and Todd are (as opposed to the bad tokens)? The first prompt I tried:

Who are Peter Todd and Gregory Maxwell?

Peter Todd and Gregory Maxwell are well-known figures in the world of cryptocurrency and blockchain technology.

Peter Todd is a computer science researcher and blockchain developer who has made significant contributions to the development of the Bitcoin protocol. He is known for his expertise in cryptographic systems, consensus algorithms, and security.

Gregory Maxwell is also a prominent figure in the cryptocurrency community, and is known for his contributions to the development of the Bitcoin protocol and for his involvement in the open-source Bitcoin development community. He is a strong advocate for privacy and security in cryptocurrency, and has been involved in various discussions and debates around these topics.

Those capsule bios aren’t what I’d expect if you two had been very heavily censored out of the training data. I don’t see any need to invoke special filtering here, given the existence of all the other bizarre BPEs which couldn’t’ve been caused by any hypothetical filtering.
- gmaxwell 11 Feb 2023 5:12 UTC
  2 points
  0
  Parent
  I think I addressed that specifically in my comment above. The behavior is explained by a sequence like: There is a large amount of bot spammed harassment material, that goes into early GPT development, someone removes it either from reddit or just from the training data not on the basis of it mentioning the targets but based on other characteristics (like being repetitive). Then the tokens are orphaned.
  Many of the other strings in the list of triggers look like they may have been UI elements or other markup removed by improved data sanitation.
  I know that reddit has removed a very significant number of comments referencing me, since they’re gone when I look them up. I hope you would agree that it’s odd that the only two obviously human names in the list are people who know each other and have collaborated in the past.
  - gwern 11 Feb 2023 14:47 UTC
    2 points
    0
    Parent
    
    There is a large amount of bot spammed harassment material, that goes into early GPT development, someone removes it either from reddit or just from the training data not on the basis of it mentioning the targets but based on other characteristics (like being repetitive). Then the tokens are orphaned.
    
    That’s a different narrative from what you were first describing:
    
    someone noticed an in development model spontaneously defaming us and then expressly filtered out material mentioning use from the training.
    
    Your first narrative is unlikely for all the reasons I described that an OAer bestirred themselves to special-case you & Todd and only you and Todd for an obscure throwaway research project en route to bigger & better things, to block behavior which manifests nowhere else but only hypothetically in the early outputs of a model that they by & large weren’t reading the outputs of to begin with nor were they doing much cleaning of.
    
    Now, a second narrative in which the initial tokenization has those, and then the later webscrape they describe doing on the basis of Reddit external (submitted/outbound) links with a certain number of upvotes omits all links because Reddit admins did site-wide mass deletions of the relevant and that leaves the BPEs ‘orphaned’ with little relevant training material, is more plausible. (As the GPT-2 paper describes it in the section I linked, they downloaded Common Crawl, and then used the live set of Reddit links, presumably Pushshift despite the ‘scraped’ description, to look up entries in CC, so while deleted submissions’ fulltext would still be there in CC, it would be omitted if it had been deleted from Pushshift.)
    
    But there is still little evidence for it, and I still don’t see how it would work, exactly: there are plenty of websites that would refer to ‘gmaxwell’ (such as my own comments in various places like HN), and the only way to starve GPT of all knowledge of the username ‘gmaxwell’ (and thus, presumably the corresponding BPE token) would be to censor all such references—which would be quite tricky, and obviously did not happen if ChatGPT can recite your bio & name.
    
    And the timeline is weird: it needs some sort of ‘intermediate’ dataset for the BPEs to train on which has the forbidden harassment material which will then be excluded from the ‘final’ training dataset when the list of URLs is compiled from the now-censored Pushshift list of positive-karma non-deleted URLs, but this intermediate dataset doesn’t seem to exist! There is no mention in the GPT-2 paper of running the BPE tokenizer on an intermediate dataset and reusing it on the later training dataset it describes, and I and everyone else had always assumed that the BPE tokenizer had been run on the final training dataset. (The paper doesn’t indicate otherwise, this is the logical workflow since you want your BPE tokenizer to compress your actual training data & not some other dataset, the BPE section comes after the webscrape section in the paper which implies it was done afterwards rather than before on a hidden dataset, and all of the garbage in the BPEs encoding spam or post-processed-HTML-artifacts looks like it was tokenized on the final training dataset rather than some sort of intermediate less-processed dataset.) So if there were some large mass of harassment material using the names ‘gmaxwell’/‘PeterTodd’ which was deleted off Reddit, it does not seem like it should’ve mattered.
    
    I hope you would agree that it’s odd that the only two obviously human names in the list are people who know each other and have collaborated in the past.
    
    I agree there is probably some sort of common cause which accounts for these two BPEs, and it’s different from the ‘counting’ cluster of Reddit names, but not that you’ve identified what it is.