This covers pretty well the altruistic reasons for/against working on technical AI safety at a frontier lab. I think the main reason for working at a frontier lab, however, is not altruistic. It’s that it offers more money and status than working elsewhere—so it would be nice to be clear-eyed about this.
To be clear, on balance, I think it’s pretty reasonable to want to work at a frontier lab, even based on the altruistic considerations alone.
What seems harder to justify altruistically, however, is why so many of us work on, and fund the same kinds of safety work that is done at frontier AI labs outside of frontier labs. After all, many of the downsides are the same: low neglectedness, safetywashing, shortening timelines, and benefiting (via industry grant programs) from the success of AI labs. Granted, it’s not impossible to get hired to a frontier lab later. But on balance, I’m not sure that the altruistic impact is so good. I do think, however, that it is a pretty good option on non-altruistic grounds, given the current abundance of funding.
It’s important to be careful about the boundaries of “the same sort of safety work.” For example, my understanding is that “Alignment faking in large language models” started as a Redwood Research project, and Anthropic only became involved later. Maybe Anthropic would have done similar work soon anyway if Redwood didn’t start this project. But, then again, maybe not. By working on things that labs might be interested in you can potentially get them to prioritize things that are in scope for them in principle but which they might nevertheless neglect.
This covers pretty well the altruistic reasons for/against working on technical AI safety at a frontier lab. I think the main reason for working at a frontier lab, however, is not altruistic. It’s that it offers more money and status than working elsewhere—so it would be nice to be clear-eyed about this.
To be clear, on balance, I think it’s pretty reasonable to want to work at a frontier lab, even based on the altruistic considerations alone.
What seems harder to justify altruistically, however, is why so many of us work on, and fund the same kinds of safety work that is done at frontier AI labs outside of frontier labs. After all, many of the downsides are the same: low neglectedness, safetywashing, shortening timelines, and benefiting (via industry grant programs) from the success of AI labs. Granted, it’s not impossible to get hired to a frontier lab later. But on balance, I’m not sure that the altruistic impact is so good. I do think, however, that it is a pretty good option on non-altruistic grounds, given the current abundance of funding.
It’s important to be careful about the boundaries of “the same sort of safety work.” For example, my understanding is that “Alignment faking in large language models” started as a Redwood Research project, and Anthropic only became involved later. Maybe Anthropic would have done similar work soon anyway if Redwood didn’t start this project. But, then again, maybe not. By working on things that labs might be interested in you can potentially get them to prioritize things that are in scope for them in principle but which they might nevertheless neglect.
Agreed that this post presents the altruistic case.
I discuss both the money and status points in the “career capital” paragraph (though perhaps should have factored them out).