80,000 hours should remove OpenAI from the Job Board (and similar EA orgs should do similarly)

Raemon3 Jul 2024 20:34 UTC

274 points

AI Alignment Fieldbuilding Effective altruism OpenAI AI

I haven’t shared this post with other relevant parties – my experience has been that private discussion of this sort of thing is more paralyzing than helpful. I might change my mind in the resulting discussion, but, I prefer that discussion to be public.

I think 80,000 hours should remove OpenAI from its job board, and similar EA job placement services should do the same.

(I personally believe 80k shouldn’t advertise Anthropic jobs either, but I think the case for that is somewhat less clear)

I think OpenAI has demonstrated a level of manipulativeness, recklessness, and failure to prioritize meaningful existential safety work, that makes me think EA orgs should not be going out of their way to give them free resources. (It might make sense for some individuals to work there, but this shouldn’t be a thing 80k or other orgs are systematically funneling talent into)

There plausibly should be some kind of path to get back into good standing with the AI Risk community, although it feels difficult to imagine how to navigate that, given how adversarial OpenAI’s use of NDAs was, and how difficult that makes it to trust future commitments.

The things that seem most significant to me:

They promised the superalignment team 20% of their compute-at-the-time (which AFAICT wasn’t even a large fraction of their compute over the coming years), but didn’t provide anywhere close to that, and then disbanded the team when Leike left.
Their widespread use of non-disparagement agreements, with non-disclosure clauses, which generally makes it hard to form accurate impressions about what’s going on at the organization.
Helen Toner’s description of how Sam Altman wasn’t forthright with the board. (i.e. “The board was not informed about ChatGPT in advance and learned about ChatGPT on Twitter. Altman failed to inform the board that he owned the OpenAI startup fund despite claiming to be an independent board member, giving false information about the company’s formal safety processes on multiple occasions. And relating to her research paper, that Altman in the paper’s wake started lying to other board members in order to push Toner off the board.”)
Hearing from multiple ex-OpenAI employees that OpenAI safety culture did not seem on track to handle AGI. Some of these are public (Leike, Kokotajlo), others were in private.

This is before getting into more openended arguments like “it sure looks to me like OpenAI substantially contributed to the world’s current AI racing” and “we should generally have a quite high bar for believing that the people running a for-profit entity building transformative AI are doing good, instead of cause vast harm, or at best, being a successful for-profit company that doesn’t especially warrant help from EAs.

I am generally wary of AI labs (i.e. Anthropic and Deepmind), and think EAs should be less optimistic about working at large AI orgs, even in safety roles. But, I think OpenAI has demonstrably messed up, badly enough, publicly enough, in enough ways that it feels particularly wrong to me for EA orgs to continue to give them free marketing and resources.

I’m mentioning 80k specifically because I think their job board seemed like the largest funnel of EA talent, and because it seemed better to pick a specific org than a vague “EA should collectively do something.” (see: EA should taboo “EA should”). I do think other orgs that advise people on jobs or give platforms to organizations (i.e. the organization fair at EA Global) should also delist OpenAI.

My overall take is something like: it is probably good to maintain some kind of intellectual/diplomatic/trade relationships with OpenAI, but bad to continue giving them free extra resources, or treat them as an org with good EA or AI safety standing.

It might make sense for some individuals to work at OpenAI, but doing so in a useful way seems very high skill, and high-context – not something to funnel people towards in a low-context job board.

I also want to clarify: I’m not against 80k continuing to list articles like Working at an AI Lab, which are more about how to make the decisions, and go into a fair bit of nuance. I disagree with that article, but it seems more like “trying to lay out considerations in a helpful way” than “just straightforwardly funneling people into positions at a company.” (I do think that article seems out of date and worth revising in light of new information. I think “OpenAI seems inclined towards safety” now seems demonstrably false, or at least less true in the ways that matter. And this should update you on how true it is for the other labs, or how likely it is to remain true)

FAQ / Appendix

Some considerations and counterarguments which I’ve thought about, arranged as a hypothetical FAQ.

Q: It seems that, like it or not, OpenAI is a place transformative AI research is likely to happen, and having good people work there is important.

Isn’t it better to have alignment researchers working there, than not? Are you sure you’re not running afoul of misguided purity instincts?

I do agree it might be necessary to work with OpenAI, even if they are reckless and negligent. I’d like to live in the world where “don’t work with companies causing great harm” was a straightforward rule to follow. But we might live in a messy, complex world where some good people may need to work with harmful companies anyway.

But: we’ve now had two waves of alignment people leave OpenAI. The second wave has multiple people explicitly saying things like “quit OpenAI due to losing confidence that it would behave responsibly around the time of AGI.”

The first wave, my guess is they were mostly under non-disclosure/non-disparagement agreements, and we can’t take their lack of criticism as much evidence.

It looks to me, from the outside, like OpenAI is just not really structured or encultured in a way that makes it that viable for someone on the inside to help improve things much. I don’t think it makes sense to continue trying to improve OpenAI’s plans, at least until OpenAI has some kind of credible plan (backed up by actions) of actually seriously working on existential safety.

I think it might make sense for some individuals to go work at OpenAI anyway, who have an explicit plan for how to interface with the organizational culture. But I think this is a very high context, high skill job. (i.e. skills like “keeping your eye on the AI safety ball”, “interfacing well with OpenAI staff/leadership while holding onto your own moral/strategic compass”, “knowing how to prioritize research that differentially helps with existential safety, rather than mostly amounting to near-term capabilities work.”)

I don’t think this is the sort of thing you should just funnel people into on a jobs board.

I think it makes a lot more sense to say “look, you had your opportunity to be taken on faith here, you failed. It is now OpenAI’s job to credibly demonstrate that it is worthwhile for good people to join there trying to help, rather than for them to take that on faith.”

Q: What about jobs like “security research engineer?”.

That seems straightforwardly good for OpenAI to have competent people for, and probably doesn’t require a good “Safety Culture” to pay off?

The argument for this seems somewhat plausible. I still personally think it makes sense to fully delist OpenAI positions unless they’ve made major changes to the org (see below).

I’m operating here from a cynical/conflict-theory-esque stance. I think OpenAI has exploited the EA community and it makes sense to engage with them from more of a cynical “conflict theory” stance. I think it makes more sense to say, collectively, “knock it off”, and switch to default “apply pressure.” I think if OpenAI wants to find good security people, that should be their job, not EA organizations.

But, I don’t have a really slam dunk argument that this is the right stance to take. For now, I list it as my opinion, but acknowledge there are other worldviews where it’s less clear what to do.

Q: What about offering a path towards “good standing?” to OpenAI?

It seems plausibly important to me to offer some kind of roadmap back to good standing. I do kinda think regulating OpenAI from the outside isn’t likely to be sufficient, because it’s too hard to specify what actually matters for existential AI safety.

So, it feels important to me not to fully burn bridges.

But, it seems pretty hard to offer any particular roadmap. We’ve got three different lines of OpenAI leadership breaking commitments, and being manipulative. So we’re long past the point where “mere words” would reassure me.

Things that would be reassure me are costly actions that are extremely unlikely in worlds where OpenAI would (intentionally or no) lure more people in and then still turn out to, nope, just be taking advantage of them for safety-washing / regulatory capture reasons.

Such actions seem pretty unlikely by now. Most of the examples I can think to spell out seem too likely to be gameable (i.e. if OpenAI were to announce a new Superalignment-equivalent team, or commitments to participate in eval regulations, I would guess they would only do the minimum necessary to look good, rather than a real version of the thing).

An example that’d feel pretty compelling is if Sam Altman actually really, for real, left the company, that would definitely have me re-evaluating my sense of the company. (This seems like a non-starter, but, listing for completeness).

I wouldn’t put much stock in a Sam Altman apology. If Sam is still around, the most I’d hope for is some kind of realistic, real-talk, arms-length negotiation where it’s common knowledge that we can’t really trust each other but maybe we can make specific deals.

I’d update somewhat if Greg Brockman and other senior leadership (i.e. people who seem to actually have the respect of the capabilities and product teams), or maybe new board members, made clear statements indicating:

they understand: how OpenAI messed up (in terms of not keeping commitments, and the manipulativeness of non-disclosure non-disparagement agreements)
they take some actions that are holding Sam (and maybe themselves in some cases) accountable.
they take existential risk seriously on a technical level. They have real cruxes for what would change their current scaling strategy. This is integrated into org-wide decisionmaking.

This wouldn’t make me think “oh everything’s fine now.” But would be enough of an update that I’d need to evaluate what they actually said/did and form some new models.

Q: What if we left up job postings, but with an explicit disclaimer linking to a post saying why people should be skeptical?

This idea just occurred to me as I got to the end of the post. Overall, I think this doesn’t make sense given the current state of OpenAI, but thinking about it opens up some flexibility in my mind about what might make sense, in worlds where we get some kind of costly signals or changes in leadership from OpenAI.

(My actual current guess is this sort of disclaimer makes sense for Anthropic and/or DeepMind jobs. This feels like a whole separate post though)

My actual range of guesses here are more cynical than this post focuses on. I’m focused on things that seemed easy to legibly argue for.

I’m not sure who has decisionmaking power at 80k, or most other relevant orgs. I expect many people to feel like I’m still bending over backwards being accommodating to an org we should have lost all faith in. I don’t have faith in OpenAI, but I do still worry about escalation spirals and polarization of discourse.

When dealing with a potentially manipulative adversary, I think it’s important to have backbone and boundaries and actual willingness to treat the situation adversarially. But also, it’s important to leave room to update or negotiate.

But, I wanted to end with explicitly flagging the hypothesis that OpenAI is best modeled as a normal profit-maximizing org, that they basically co-opted EA into being a lukewarm ally it could exploit, when it’d have made sense to treat OpenAI more adversarially from the start (or at least be more “ready to pivot towards treating adversarially”.

I don’t know that that’s the right frame, but I think the recent revelations should be an update towards that frame.

What links here?