Owain_Evans comments on Mysteries of mode collapse

Owain_Evans 31 Jan 2023 17:30 UTC
LW: 3 AF: 1
0
AF
OpenAI had generated poems in the New Yorker, which suggests they might have had some internal project related to poetry.

With GPT3.5, I think there’s also “mode collapse” for style in writing prose (e.g. plays or stories).

Claude does not have this mode collapse in poetry or prose. (It maybe has a much more subtle version of it). This suggests to me it’d be relatively easy to fix ChatGPT’s issues (as Gwern suggests).

Does anyone know how much poetry and literary prose is in the pre-training sets aside from stuff in Common Crawl?
- gwern 31 Jan 2023 18:06 UTC
  LW: 8 AF: 5
  2
  AF Parent
  
  OpenAI had generated poems in the New Yorker, which suggests they might have had some internal project related to poetry.
  
  I didn’t get that impression from that when I read it—the NYer author and his friends prompted most of that, even if their friend Dan Selsam happens to work at OpenAI. (He seems to work on math LMs, nothing fiction or RL-related.) EDIT: the later articles make it clear that Selsam wasn’t supposed to be giving them access to GPT-4-base or other stuff. They were set up with the public Playground interface, so the OA insider role here was limited to showing them a few completions and trying to explain it; presumably they did the rest more remote and partially on their own. Specifically, some parts of it, like the choice of Shel Silverstein (a far from obvious poet to pick, even if his fiction is beloved by American children), suggest they (like pretty much anyone interested in GPT-3 poetry) read my page for ideas. Also, again, Leike, who’s in charge at OA, denies having done anything poetry-specific or knowing about the apparent capability-gain.
  
  It maybe has a much more subtle version of it.
  
  Yeah, that’s a funny thing about mode collapse, it’s really hard to see, and the higher-quality the outputs get, the harder it’ll be to see with ‘the naked eye’. Who knows every literary genre there is and can patiently prompt them one by one to see which genres a model quietly slides away from & tries to avoid generating text in? Like hands in GANs… It takes a while to begin to see what you aren’t seeing. This is why you need metrics like FID, which work over an entire dataset and measure whether sampled outputs span the entire dataset, rather than focus on a large subset. However, no one is doing an FID for LLMs for creative purposes. (That would be hard, but not impossible.) So, we don’t really have any way to quantify mode-collapse like in poetry.
  
  Of course, I’d also expect Claude to be much subtler simply because it’s working off less data and so it’s less likely to have gotten rated text or inputs which would push it towards mode-collapsing on easily-recognized rhyming poetry and to avoid harder-to-understand poetry. (Claude is just the ‘constitutional prompt’ model, right? Hard to see how a list of generic principles would push it towards rhyming-only.)
  
  Does anyone know how much poetry and literary prose is in the pre-training sets aside from stuff in Common Crawl?
  
  OA has been resolutely silent about the composition of the data like Books1/Books2. But it seems safe to say that it would include all the obvious datasets like Project Gutenberg, so there is much more poetry/literary prose available than necessary. Sample size should not be an issue. (Rhyming really is not that complex, if you understand phonetics.)
  - gwern 25 Jun 2023 2:10 UTC
    LW: 7 AF: 5
    1
    AF Parent
    
    Of course, I’d also expect Claude to be much subtler simply because it’s working off less data and so it’s less likely to have gotten rated text or inputs which would push it towards mode-collapsing on easily-recognized rhyming poetry and to avoid harder-to-understand poetry. (Claude is just the ‘constitutional prompt’ model, right? Hard to see how a list of generic principles would push it towards rhyming-only.)
    
    To elaborate a bit more on this: as Owain notes, Claude is very good at writing poetry & text-style transfer (eg 1, 2, 3), and I really ought to try it more sometime.
    
    Claude uses a variant of RLHF they dub ‘AIHF’. In the classic Christiano RLHF, you take a lot of text data, from anywhere (such as users of an API), and label pairs by which one is better; your GPT model is finetuned to predict it, and then used as an oracle to train another GPT reinforcement-learning-style to maximize the reward from the oracle. In AIHF, you get your text data by instead starting with a do-gooder ‘principles’ prompt, full of things like the Declaration of Independence. and use it to generate your large text dataset, and then do RLHF on that.
    
    In RLHF, by my theory of rhyming mode collapse, what happens is that some OA API users were playing around with poetry (such as myself), and those text samples would be used in comparisons by human raters; these human raters are usually not poetry connoisseurs, and have a bias towards easily-rated poetry (a laziness bias documented in RLHF papers and which is a major challenge to RLHF in general), such as formal rhyming poetry; rhyming poetry becomes highly rewarded by the preference model because of this bias, and because the preference model doesn’t understand what rhyming is in general, it can only reward rhymes that the base model has already memorized, so, the final model maximizes rhyming only within the set of memorized rhymes, leading to our observations—models which initially seem like amazing poets but are unable to write anything but rhymes, even when explicitly instructed, unable to write in different styles, always horribly bland, frequently jamming in positive moralizing unasked for, etc.
    
    You immediately see why AIHF would not produce mode collapse for rhyming, or many other things: there’s no reason that any of the ‘red team’ or self-generated text would involve poetry, and if it did, the ‘principles’ would be neutral about said poetry. (There is nothing in the UN Declaration of Human Rights saying that most contemporary non-rhyming poetry constitutes a crime against humanity, even if arguably it is.) So, AIHF should leave rhyming alone, preserving the base model’s capabilities intact and showing what models at that scale can really do.
    
    This has motivated me to get around to signing up for Claude. It’s so depressing to punch in a prompt to GPT-4 which ought to be hilarious and creative, and then no matter what the prompt is, out comes a highschool essay in 4 paragraphs which ends on an uplifting note.