gwern comments on Lao Mein’s Shortform

gwern 23 Sep 2024 22:19 UTC
5 points
1
Note if you are doubtful about whether OA researchers would really be that lazy and might let poor data choices slide by, consider that the WSJ is reporting 3 days ago that Scale, the multi-billion-dollar giant data labeler, whose job is creating & cleaning data for the past decade, last year blew a Facebook contract when the FB researchers actually looked at their data and noticed a lot of it starting “As an AI language model...”:

Facebook’s code name is Flamingo—a stuffed version of which sat atop an employee’s desk on a recent visit to the startup’s headquarters. After Scale AI bungled a project last year for the tech giant, Wang declared a company emergency and launched an all-hands-on-deck effort to fix the job, called Flamingo Revival, according to former Scale employees.

Early last year, Meta Platforms asked the startup to create 27,000 question-and-answer pairs to help train its AI chatbots on Instagram and Facebook. When Meta researchers received the data, they spotted something odd. Many answers sounded the same, or began with the phrase “as an AI language model…” It turns out the contractors had used ChatGPT to write-up their responses—a complete violation of Scale’s raison d’être.

The researchers communicated the disappointing results to Scale, prompting Wang to rally the entire company to try and save the contract. He asked employees to drop everything and create new writing samples to send to Meta. An internal leaderboard showed who had completed the most labeling tasks. The prize for the winner: a paid vacation.

As usual, Hanlon’s razor can explain a lot about the world. (Amusingly, the HuggingFace “No Robots” instruction-tuning dataset advertises itself as “Look Ma, an instruction dataset that wasn’t generated by GPTs!”)
- Lao Mein 26 Sep 2024 13:37 UTC
  3 points
  0
  Parent
  OK, I’m starting to see your point. Why do you think OpenAI is so successful despite this? Is their talent and engineering direction just that good? Is everyone else even worse at data management?
  - gwern 26 Sep 2024 22:43 UTC
    7 points
    1
    Parent
    They (historically) had a large head start(up) on being scaling-pilled and various innovations like RLHF/instruction-tuning*, while avoiding pathologies of other organizations, and currently enjoy some incumbent advantages like what seems like far more compute access via MS than Anthropic gets through its more limited partnerships. There is, of course, no guarantee any of that will last, and it generally seems like (even allowing for the unknown capabilities of GPT-5 and benefits from o1 and everything else under the hood) the OA advantage over everyone else has been steadily eroding since May 2020.
    
    * which as much as I criticize the side-effects, have been crucial in democratizing LLM use for everybody who just wants to something done instead of learning the alien mindset of prompt-programming a base model