jacobjacob comments on Talent Needs of Technical AI Safety Teams

jacobjacob 24 May 2024 2:44 UTC
130 points
92
To anyone reading this who is considering working in alignment --
Following the recent revelations, I now believe OpenAI should be regarded as a bad faith actor. If you go work at OpenAI, I believe your work will be net negative; and will most likely be used to “safetywash” or “governance-wash” Sam Altman’s mad dash to AGI. It now appears Sam Altman is at least a sketchy as SBF. Attempts to build “social capital” or “affect the culture from the inside” will not work under current leadership (indeed, what we’re currently seeing are the failed results of 5+ years of such attempts). I would very strongly encourage anyone looking to contribute to stay away from OpenAI.
I recognize this is a statement, and not an argument. I don’t have the time to write out the full argument. But I’m leaving this comment here, such that others can signal agreement with it.
- Raemon 26 May 2024 21:37 UTC
  34 points
  4
  Parent
  I’m around ~40% on “4 years from now, I’ll think it was clearly the right call for alignment folk to just stop working at OpenAI, completely.”
  But, I think it’s much more likely that I’ll continue endorsing something like “Treat OpenAI as a manipulative adversary by default, do not work there or deal with them unless you have a concrete plan for how you are being net positive. And because there’s a lot of optimization power in their company, be pretty skeptical that any plans you make will work. Do not give them free resources (like inviting them to EA global or job fairs)”.
  I think it’s nonetheless good to have some kind of “stated terms” for what actions OpenAI / Sam etc could take that might make it more worthwhile to work with them in the future (or, to reduce active opposition to them). Ultimately, I think OpenAI is on track to destroy the world, and I think actually stopping them will somehow require their cooperation at some point. So I don’t think I’d want to totally burn bridges.
  But I also don’t think there’s anything obvious Sam or OpenAI can do to “regain trust.” I think the demonstrated actions with the NDAs, and Sam’s deceptive non-apology, means they’ve lost the ability to credibly signal good faith.
  ...
  Some background:
  Last year, when I was writing “Carefully Bootstrapped Alignment” is organizationally hard, I chatted with people at various AI labs.
  I came away with the impression that Anthropic kinda has a culture/leadership that (might, possibly) be worth investing in (but which I’d still need to see more proactive positive steps to really trust), and that DeepMind was in a weird state where it’s culture wasn’t very unified, but the leadership seemed at least vaguely in the right place.
  I still had a lot of doubts about those companies, but when I talked to people I knew there, I got at least some sense that there was an internal desire to be safety-conscious.
  When I talked to people at OpenAI, the impression I came away with was “there’s really no hope of changing the culture there. Do not bother trying.”
  (I think the people I talked to at all orgs were generally not optimistic about changing culture, and instead more focused on developing standards that could eventually turn into regulations, which would make it harder for the orgs to back out of agreements)
  That was last year, before the seriousness of the Nondisparagement clauses and the pressure put on people became more clear cut. And, before reading Zach’s post about AI companies aren’t really using external evaluators.
- Austin Chen 24 May 2024 22:21 UTC
  14 points
  0
  Parent
  Hm, I disagree and would love to operationalize a bet/market on this somehow; one approach is something like “Will we endorse Jacob’s comment as ‘correct’ 2 years from now?”, resolved by a majority of Jacob + Austin + <neutral 3rd party>, after deliberating for ~30m.
  - jacobjacob 25 May 2024 16:34 UTC
    4 points
    0
    Parent
    Sure that works! Maybe use a term like “importantly misguided” instead of “correct”? (Seems easier for me to evaluate)
- Ben Pace 24 May 2024 3:06 UTC
  12 points
  6
  Parent
  Mostly seems sensible to me (I agree that a likely model is that there’s a lot of deceptive and manipulative behavior coming from the top and that caring about extinction risks was substantially faked), except that I would trust an agreement from Altman much more than an agreement from Bankman-Fried.
- Linch 31 May 2024 13:39 UTC
  1 point
  0
  Parent
  I weakly disagree. The fewer safety-motivated people want to work at OpenAI, the stronger the case for any given safety person to work there.
  
  Also, now that there are enough public scandals, hopefully anybody wanting to work at OpenAI will be sufficiently guarded and going in with their eyes fully open, rather than naive/oblivious.
  - Thane Ruthenis 31 May 2024 14:25 UTC
    13 points
    4
    Parent
    Counter-counter-argument: the safety-motivated people, especially if entering at the low level, have ~zero ability to change anything for the better internally, while they could usefully contribute elsewhere, and the presence of token safety-motivated people at OpenAI improves OpenAI’s ability to safety-wash its efforts (by pointing at them and going “look how much resources we’re giving them!”, like was attempted with Superalignment).