Seth Herd comments on Reasons for and against working on technical AI safety at a frontier AI lab

Seth Herd Jan 5, 2025, 8:10 PM
7 points
−8
This is useful.
I’m increasingly worried about evaporative cooling after all of those people left OpenAI. It’s good to have some symbolic protests, but there’s also a selfish component to protecting your ideals and reputation within your in-group.
I haven’t gotten around to writing about this, so here’s a brief sketch of my argument for why the safety-focused people should be working at OpenAI, let alone the much better DeepMind or Anthropic, at any opportunity. There’s one major caveat in the last section about your work and mindset shifting from x-risk to the much less impactful mundane safety.
Let’s ask about the decision in terms of counterfactuals: work at an AGI company compared to what?
The other choice seems to be that someone works there who’s less concerned with safety than you are, that they’re slightly less skilled than you, and that someone (maybe you) who does care about safety a lot doesn’t work on safety at all. The person who does work at the company cares less about safety than you because they took the offer; and if they were actually more skilled but less motivated for that job (resulting in them being on net slightly down the hiring list), that change in safety-caring could be pretty large.
You don’t get talked out of caring about safety, but neither do you shift company hivemind to care about safety.
Here’s the logic:
Suppose you turn down the job because the reasons against outweigh the reason for, for you in particular. The next candidate down the list of company hiring preferences takes the job offer. Now you try to get funding to work on AI safety. The field is currently considered pretty sharply funding limited, so either you or someone else won’t wind up getting funding. Maybe you or they do good work anyway without funding, but doing a lot of it seems pretty darned unlikely.
So now there’s an additional person working in safety who cares less, and someone who cares more is not working in safety.
Now to the arguments for shifting type of work and type of mindset. People worry a lot that if they work at a major org, they will be corrupte and lose their focus on safety. This will definitely happen. Humans are extremely susceptible to peer pressure in forming their beliefs; see my brief bit on motivated reasoning for some explanation and arguments for this, but I think it’s pretty obvious this is a big factor in how people form and change beliefs and motivations. Those who think they’re not influenced by peer pressure are more vulnerable (even if they’re partly correct) by being blind to the emotional tugs from respected peers. The few people who truly are mostly immune are usually so contrary that they aren’t even candidates for working in orgs; they’re terrible team members because they’re blind to how others want them to behave. So, yes, you’ll be corrupted.
But this works both ways! You’ll also be shifting the beliefs of the org at least a little toward taking x-risk seriously. How much is in question, but on average I think it’s positive-sum.
How much of each of these happens will be a product of a few things: how charming you are, how skilled you are at presenting your views in an appealing (and not irritating) way, and how thoroughly thought-out they are.
Presumably, truth is on your side, and you are dealing with people who at least fancy themselves to be fans of the truth. Thorough discussions over time and intelligent people will lean toward taking x-risk seriously.
Thus, the sum should be that there is more x-risk concern in the world if you work at the org.
There are some ways this could fail to be true. But in my view, most of the arguments against are pretty colored by a tendency to want to impress the in-group: x-risk concerned rationalists.