Zach Stein-Perlman comments on Anthropic leadership conversation

Zach Stein-Perlman 21 Dec 2024 1:59 UTC
7 points
1
I think he’s just referring to DC evals, and I think this is wrong because I think other companies doing evals wasn’t really caused by Anthropic (but I could be unaware of facts).

Edit: maybe I don’t know what he’s referring to.
- Daniel Kokotajlo 21 Dec 2024 13:39 UTC
  11 points
  2
  Parent
  DC evals got started in summer of ’22, across all three leading companies AFAICT. And I was on the team that came up with the idea and started making it happen (both internally and externally), or at least, as far as I can tell we came up with the idea—I remember discussions between Beth Barnes and Jade Leung (who were both on the team in spring ’22), and I remember thinking it was mostly their idea (maybe also Cullen’s?) It’s possible that they got it from Anthropic but it didn’t seem that way to me. Update: OK, so apparently @evhub had joined Anthropic just a few months earlier [EDIT this is false evhub joined much later, I misread the dates, thanks Lawrence] -- it’s possible the Frontier Red Team was created when he joined then, and information spread to the team I was on (but not to me) about it. I’m curious to find out what happened here, anyone wanna weigh in?
  
  At any rate I don’t think there exists any clone or near-clone of the Frontier Red Team at OpenAI or any other company outside Anthropic.
  - LawrenceC 21 Dec 2024 20:40 UTC
    8 points
    0
    Parent
    Evan joined Anthropic in late 2022 no? (Eg his post announcing it was Jan 2023 https://www.alignmentforum.org/posts/7jn5aDadcMH6sFeJe/why-i-m-joining-anthropic)
    
    I think you’re correct on the timeline, I remember Jade/Jan proposing DC Evals in April 2022, (which was novel to me at the time), and Beth started METR in June 2022, and I don’t remember there being such teams actually doing work (at least not publically known) when she pitched me on joining in August 2022.
    
    It seems plausible that anthropic’s scaring laws project was already under work before then (and this is what they’re referring to, but proliferating QA datasets feels qualitatively than DC Evals). Also, they were definitely doing other red teaming, just none that seem to be DC Evals
    - ryan_greenblatt 21 Dec 2024 21:09 UTC
      5 points
      6
      Parent
      
      scaring laws
      
      lol