Daniel Kokotajlo comments on Anthropic leadership conversation

Daniel Kokotajlo 21 Dec 2024 13:39 UTC
11 points
3
DC evals got started in summer of ’22, across all three leading companies AFAICT. And I was on the team that came up with the idea and started making it happen (both internally and externally), or at least, as far as I can tell we came up with the idea—I remember discussions between Beth Barnes and Jade Leung (who were both on the team in spring ’22), and I remember thinking it was mostly their idea (maybe also Cullen’s?) It’s possible that they got it from Anthropic but it didn’t seem that way to me. Update: OK, so apparently @evhub had joined Anthropic just a few months earlier [EDIT this is false evhub joined much later, I misread the dates, thanks Lawrence] -- it’s possible the Frontier Red Team was created when he joined then, and information spread to the team I was on (but not to me) about it. I’m curious to find out what happened here, anyone wanna weigh in?

At any rate I don’t think there exists any clone or near-clone of the Frontier Red Team at OpenAI or any other company outside Anthropic.
- LawrenceC 21 Dec 2024 20:40 UTC
  8 points
  0
  Parent
  Evan joined Anthropic in late 2022 no? (Eg his post announcing it was Jan 2023 https://www.alignmentforum.org/posts/7jn5aDadcMH6sFeJe/why-i-m-joining-anthropic)
  
  I think you’re correct on the timeline, I remember Jade/Jan proposing DC Evals in April 2022, (which was novel to me at the time), and Beth started METR in June 2022, and I don’t remember there being such teams actually doing work (at least not publically known) when she pitched me on joining in August 2022.
  
  It seems plausible that anthropic’s scaring laws project was already under work before then (and this is what they’re referring to, but proliferating QA datasets feels qualitatively than DC Evals). Also, they were definitely doing other red teaming, just none that seem to be DC Evals
  - ryan_greenblatt 21 Dec 2024 21:09 UTC
    5 points
    6
    Parent
    
    scaring laws
    
    lol