habryka comments on Common misconceptions about OpenAI

habryka 25 Aug 2022 20:58 UTC
LW: 67 AF: 33
31
AF

The main group of people working on alignment (other than interpretability) at OpenAI at the time of the Anthropic split at the end of 2020 was the Reflection team, which has since been renamed to the Alignment team. Of the 7 members of the team at that time (who are listed on the summarization paper), 4 are still working at OpenAI, and none are working at Anthropic.

I think this is literally true, but at least as far as I know is not really conveying the underlying dynamics and so I expect readers to walk away with the wrong impression.

Again, I might be totally wrong here, but as far as I understand the underlying dynamics is that there was a substantial contingent of people who worked at OpenAI because they cared about safety but worked in a variety of different roles, including many engineering roles. That contingent had pretty strong disagreements with leadership about a mixture of safety and other operating priorities (but I think mostly safety). Dario in-particular had lead a lot of the capabilities research and was dissatisfied with how the organization was run.

Dario left and founded Anthropic, taking a substantial number of engineering and research talent with him (I don’t know the details, but I’ve heard statements to the effect that he took ²⁄₄ top engineers), and around the same time a substantial contingent of other people concerned about safety also left the organization, since I think they became much less optimistic about their ability to do safety research in the organization in the absence of Dario.

Some of them went to Anthropic, others went to Redwood, others went and did their own thing (e.g. Paul). Some previous OpenAI staff that had left earlier then joined Anthropic.

I think it is interesting that of the one team that was officially working on safety nobody directly went to Anthropic (except Dario himself), but the above paragraph is I think failing to convey the degree to which there was a substantial exodus out of OpenAI into Anthropic, and a general exodus of safety concerned people out of OpenAI.
- Howie Lempel 29 Aug 2022 12:53 UTC
  45 points
  11
  Parent
  [I privately wrote the following quick summary of some publicly-available information on (~safety-relevant) talent leaving OpenAI since the founding of Anthropic. Seems worth pasting here since it already exists but I’d have been more careful if I wrote it with public sharing in mind, it’s not comprehensive, and I don’t have time to really edit. I’d advise against updating too hard on it because:
  - I basically don’t have any visibility into OpenAI
  - Inferences from LinkedIn often don’t give a super accurate sense of somebody’s contribution.
  - I wrote down what I know about departures from OpenAI but didn’t try to write up new hires in the same way.
  - It’s often impossible for people at orgs to talk publicly about personnel issues/departures so if Jacob/others don’t correct me, it’s not very strong evidence that nothing below is inaccurate/misleading.]
  The main group of people working on alignment (other than interpretability) at OpenAI at the time of the Anthropic split at the end of 2020 was the Reflection team, which has since been renamed to the Alignment team. Of the 7 members of the team at that time (who are listed on the summarization paper), 4 are still working at OpenAI, and none are working at Anthropic.
  
  Like Habryka, I believe it’s literally true that nobody from the “Alignment team” left for Anthropic and ⁴⁄₇ are still working at OpenAI. But it seems possible that things look different if you weight by seniority and account for potential contributions to OpenAI’s attention to existential safety made by people who weren’t technical safety researchers, who were researchers on another team, etc.
  
  Important: I don’t know why the below people left OpenAI and their inclusion doesn’t mean there’s any bad blood between them or that they necessarily have criticisms of OpenAI’s attitude toward safety.
  
  If I understand correctly,
  
  1 The alignment team lost its team lead (Paul).
  
  2 Two senior people who weren’t counted as on the team but oversaw it or helped with its research direction left for Anthropic.
  - VP of safety and policy (Daniela) whose linked in says she oversaw the safety and policy teams
  - VP of Research (Dario), who was the Team Lead for AI Safety before he got promoted and says he built and lead several of their long-term safety teams left for Anthropic. He was also an author on the summarization paper Jacob references. Id guess that he continued to be a contributor to their AI Safety work after being promoted.
  3 The head of the interpretability team (Chris Olah), which is one of the other teams that seems most relevant to existential safety, left for Anthropic.
  - (Jacob acknowledges this earlier in the post)
  4 Other Anthropic co-founders who left OpenAI include
  - Tom Brown (led the engineering of GPT-3)
  - Sam McCandlish and Jared Kaplan (just a consultant), who I think led their scaling laws research? I think I heard Jared is leading an Anthropic alignment team? I think Sam M did a fellowship on the safety team before building the scaling laws team
  5 Another person who worked on technical safety at OpenAI and left for Anthropic
  - Tom Henighan was on the technical staff, safety team but I guess not on the alignment team?
  6 Several people on the policy team left for Anthropic including the director and two EAs who are interested in alignment.
  - Policy Director, Jack Clark
  - Danny Hernandez
  - Amanda Askell
  7 Another EA who I believe cares about alignment and left OpenAI for Anthropic:
  - Nicholas Joseph
  8 Other people I don’t know who left for Anthropic
  - Kamal Ndousse
  - Benjamin Mann. LinkedIn says he was on their security and safety working groups
  9 Holden is no longer on OpenAI’s board (though Helen Toner now is).
  
  On the other hand, they’ve also hired some EAs who care about alignment since then. I believe examples include:
  1. Jan Leike, alignment team lead
  2. Richard Ngo, team lead(?) for futures subteam of the policy team
  3. Daniel Kokotajlo, futures subteam
  4. Surely others I don’t know of or am leaving out
- Jacob_Hilton 26 Aug 2022 14:51 UTC
  LW: 7 AF: 1
  0
  AF Parent
  Without commenting on the specifics, I have edited to the post to mitigate potential confusion: “this fact alone is not intended to provide a complete picture of the Anthropic split, which is more complicated than I am able to explain here”.