Raemon comments on Talent Needs of Technical AI Safety Teams

Raemon 26 May 2024 21:37 UTC
34 points
4
I’m around ~40% on “4 years from now, I’ll think it was clearly the right call for alignment folk to just stop working at OpenAI, completely.”
But, I think it’s much more likely that I’ll continue endorsing something like “Treat OpenAI as a manipulative adversary by default, do not work there or deal with them unless you have a concrete plan for how you are being net positive. And because there’s a lot of optimization power in their company, be pretty skeptical that any plans you make will work. Do not give them free resources (like inviting them to EA global or job fairs)”.
I think it’s nonetheless good to have some kind of “stated terms” for what actions OpenAI / Sam etc could take that might make it more worthwhile to work with them in the future (or, to reduce active opposition to them). Ultimately, I think OpenAI is on track to destroy the world, and I think actually stopping them will somehow require their cooperation at some point. So I don’t think I’d want to totally burn bridges.
But I also don’t think there’s anything obvious Sam or OpenAI can do to “regain trust.” I think the demonstrated actions with the NDAs, and Sam’s deceptive non-apology, means they’ve lost the ability to credibly signal good faith.
...
Some background:
Last year, when I was writing “Carefully Bootstrapped Alignment” is organizationally hard, I chatted with people at various AI labs.
I came away with the impression that Anthropic kinda has a culture/leadership that (might, possibly) be worth investing in (but which I’d still need to see more proactive positive steps to really trust), and that DeepMind was in a weird state where it’s culture wasn’t very unified, but the leadership seemed at least vaguely in the right place.
I still had a lot of doubts about those companies, but when I talked to people I knew there, I got at least some sense that there was an internal desire to be safety-conscious.
When I talked to people at OpenAI, the impression I came away with was “there’s really no hope of changing the culture there. Do not bother trying.”
(I think the people I talked to at all orgs were generally not optimistic about changing culture, and instead more focused on developing standards that could eventually turn into regulations, which would make it harder for the orgs to back out of agreements)
That was last year, before the seriousness of the Nondisparagement clauses and the pressure put on people became more clear cut. And, before reading Zach’s post about AI companies aren’t really using external evaluators.