Wei Dai comments on Wei Dai’s Shortform

Wei Dai 22 Sep 2024 12:04 UTC
LW: 39 AF: 17
6
AF
What is going on with Constitution AI? Does anyone know why no LLM aside from Claude (at least none that I can find) has used it? One would think that if it works about as well as RLHF (which it seems to), AI companies would be flocking to it to save on the cost of human labor?
Also, apparently ChatGPT doesn’t know that Constitutional AI is RLAIF (until I reminded it) and Gemini thinks RLAIF and RLHF are the same thing. (Apparently not a fluke as both models made the same error 2 out of 3 times.)
- habryka 22 Sep 2024 22:04 UTC
  LW: 16 AF: 9
  5
  AF Parent
  Isn’t the basic idea of Constitutional AI just having the AI provide its own training feedback using written instruction? My guess is there was a substantial amount of self-evaluation in the o1 training with complicated written instructions, probably kind of similar to a constituion (though this is just a guess).
  - aogara 23 Sep 2024 18:32 UTC
    3 points
    0
    Parent
    This is my impression too. See e.g. this recent paper from Google, where LLMs critique and revise their own outputs to improve performance in math and coding.
- Vladimir_Nesov 22 Sep 2024 17:11 UTC
  LW: 8 AF: 4
  2
  AF Parent
  These posts might be relevant:
  - A recipe for frontier model post-training
  - Futures of the data foundry business model
  The details of Constitutional AI seem highly contingent, while the general idea is simply automation of data for post-training, so that the remaining external input is the “constitution”. In the original paper there are recipes both for instruction tuning data and for preference data. RLAIF is essentially RLHF that runs on synthetic preference data, maybe together with a recipe for generating it. But preference data could also be used to run DPO or something else, in which case RLAIF becomes a misnomer for describing automation of that preference data.
  
  Llama 3 report suggests that instruction tuning data can be largely automated, but human preference data is still better. And data foundry business is still alive, so a lot of human data is at least not widely recognized as useless. But it’s unclear if future models won’t soon do better than humans at labeling, or possibly already do better at some leading labs. Meta didn’t have a GPT-4 level model as a starting point before Llama 3, and then there are the upcoming 5e26 FLOPs models, and o1-like reasoning models.
- Wei Dai 27 Sep 2024 5:33 UTC
  LW: 6 AF: 4
  0
  AF Parent
  As a tangent to my question, I wonder how many AI companies are already using RLAIF and not even aware of it. From a recent WSJ story:
  
  Early last year, Meta Platforms asked the startup to create 27,000 question-and-answer pairs to help train its AI chatbots on Instagram and Facebook.
  
  When Meta researchers received the data, they spotted something odd. Many answers sounded the same, or began with the phrase “as an AI language model…” It turns out the contractors had used ChatGPT to write-up their responses—a complete violation of Scale’s raison d’être.
  
  So they detected the cheating that time, but in RLHF how would they know if contractors used AI to select which of two AI responses is more preferred?
  
  BTW here’s a poem(?) I wrote for Twitter, actually before coming across the above story:
  
  The people try to align the board. The board tries to align the CEO. The CEO tries to align the managers. The managers try to align the employees. The employees try to align the contractors. The contractors sneak the work off to the AI. The AI tries to align the AI.
  - abramdemski 27 Sep 2024 17:14 UTC
    2 points
    0
    Parent
    yyyep
- Nathan Helm-Burger 22 Sep 2024 14:42 UTC
  5 points
  3
  Parent
  Maybe others are using it in secret but don’t want to admit it for some reason? I can’t find any mention of Anthropic having filed a patent on the idea, but maybe other companies would feel too much like it looked like they were second-rate imitators if they said they were copying Anthropic’s idea?
  
  Just speculating, I don’t know. Sure seems like a useful idea to copy.
  - Wei Dai 26 Sep 2024 21:46 UTC
    8 points
    0
    Parent
    AI companies don’t seem to be shy about copying RLHF though. Llama, Gemini, and Grok are all explicitly labeled as using RLHF.