I’d define “genuine safety role” as “any qualified person will increase safety faster that capabilities in the role”. I put ~0 likelihood that OAI has such a position. The best you could hope for is being a marginal support for a safety-based coup (which has already been attempted, and failed).
“~0 likelihood” means that you are nearly certain that OAI does not have such a position (ie, your usage of “likelihood” has the same meaning as “degree of certainty” or “strength of belief”)? I’m being pedantic because I’m not a probability expert and AFAIK “likelihood” has some technical usage in probability.
If you’re up for answering more questions like this, then how likely do you believe it is that OAI has a position where at least 90% of people who are both, (A) qualified skill wise (eg, ML and interpretability expert), and, (B) believes that AIXR is a serious problem, would increase safety faster than capabilities in that position?
There’s a different question of “could a strategic person advance net safety by working at OpenAI, more so than any other option?”. I believe people like that exist, but they don’t need 80k to tell them about OpenAI.
This is a good point and you mentioning it updates me towards believing that you are more motivated by (1) finding out what’s true regarding AIXR and (2) reducing AIXR, than something like (3) shit talking OAI.
how likely do you believe it is that OAI has a position where at least 90% of people who are both, (A) qualified skill wise (eg, ML and interpretability expert), and, (B) believes that AIXR is a serious problem, would increase safety faster than capabilities in that position?
The cheap answer here is 0, because I don’t think there is any position where that level of skill and belief in AIXR has a 90% chance of increasing net safety. Ability to do meaningful work in this field is rarer than that.
So the real question is how does OpenAI compare to other possibilities? To be specific, let’s say being an LTFF-funded solo researcher, academia, and working at Anthropic.
Working at OpenAI seems much more likely to boost capabilities than solo research and probably academia. Some of that is because they’re both less likely to do anything. But that’s because they face OOM less pressure to produce anything, which is an advantage in this case. LTFF is not a pressure- or fad-free zone, but they have nothing near the leverage of paying someone millions of dollars, or providing tens of hours each week surrounded by people who are also paid millions of dollars to believe they’re doing safe work.
I feel less certain about Anthropic. It doesn’t have any of terrible signs OpenAI did (like the repeated safety exoduses, the board coup, and clawbacks on employee equity), but we didn’t know about most of those a year ago.
If we’re talking about a generic skilled and concerned person, probably the most valuable thing they can do is support someone with good research vision. My impression is that these people are more abundant at Anthropic than OpenAI, especially after the latest exodus, but I could be wrong. This isn’t a crux for me for the 80k board[1] but it is a crux for how much good could be done in the role.
Some additional bits of my model:
I doubt OpenAI is going to tell a dedicated safetyist they’re off the safety team and on direct capabilities. But the distinction is not always obvious, and employees will be very motivated to not fight OpenAI on marginal cases.
You know those people who stand too close, so you back away, and then they move closer? Your choices in that situation are to steel yourself for an intense battle, accept the distance they want, or leave. Employers can easily pull that off at scale. They make the question become “am I sure this will never be helpful to safety?” rather than “what is the expected safety value of this research?”
Alternate frame: How many times will an entry level engineer get to say no before he’s fired?
I have a friend who worked at OAI. They’d done all the right soul searching and concluded they were doing good alignment work. Then they quit, and a few months later were aghast at concerns they’d previous dismissed. Once you are in the situation is is very hard to maintain accurate perceptions.
Something @Buck said made me realize I was conflating “produce useful theoretical safety work” with “improve the safety of OpenAI’s products.” I don’t think OpenAI will stop production for safety reasons[2], but they might fund theoretical work that is useful to others, or that is cheap to follow themselves (perhaps because it boosts capabilities as well...).
This is a good point and you mentioning it updates me towards believing that you are more motivated by (1) finding out what’s true regarding AIXR and (2) reducing AIXR, than something like (3) shit talking OAI.
Thank you. My internal experience is that my concerns stem from around x-risk (and belatedly the wage theft). But OpenAI has enough signs of harm and enough signs of hiding harm that I’m fine shit talking as a side effect, where normally I’d try for something more cooperative and with lines of retreat.
I think the clawbacks are disqualifying on their own, even if they had no safety implications. They stole money from employees! That’s one of the top 5 signs you’re in a bad workplace. 80k doesn’t even mention this.
to ballpark quantify: I think there is <5% chance that OpenAI slows production by 20% or more, in order to reduce AIXR. And I believe frontier AI companies need to be prepared to slow by more than that.
“~0 likelihood” means that you are nearly certain that OAI does not have such a position (ie, your usage of “likelihood” has the same meaning as “degree of certainty” or “strength of belief”)? I’m being pedantic because I’m not a probability expert and AFAIK “likelihood” has some technical usage in probability.
If you’re up for answering more questions like this, then how likely do you believe it is that OAI has a position where at least 90% of people who are both, (A) qualified skill wise (eg, ML and interpretability expert), and, (B) believes that AIXR is a serious problem, would increase safety faster than capabilities in that position?
This is a good point and you mentioning it updates me towards believing that you are more motivated by (1) finding out what’s true regarding AIXR and (2) reducing AIXR, than something like (3) shit talking OAI.
I asked a related question a few months ago, ie, if one becomes doom pilled while working as an executive at an AI lab and one strongly values survival, what should one do?
The cheap answer here is 0, because I don’t think there is any position where that level of skill and belief in AIXR has a 90% chance of increasing net safety. Ability to do meaningful work in this field is rarer than that.
So the real question is how does OpenAI compare to other possibilities? To be specific, let’s say being an LTFF-funded solo researcher, academia, and working at Anthropic.
Working at OpenAI seems much more likely to boost capabilities than solo research and probably academia. Some of that is because they’re both less likely to do anything. But that’s because they face OOM less pressure to produce anything, which is an advantage in this case. LTFF is not a pressure- or fad-free zone, but they have nothing near the leverage of paying someone millions of dollars, or providing tens of hours each week surrounded by people who are also paid millions of dollars to believe they’re doing safe work.
I feel less certain about Anthropic. It doesn’t have any of terrible signs OpenAI did (like the repeated safety exoduses, the board coup, and clawbacks on employee equity), but we didn’t know about most of those a year ago.
If we’re talking about a generic skilled and concerned person, probably the most valuable thing they can do is support someone with good research vision. My impression is that these people are more abundant at Anthropic than OpenAI, especially after the latest exodus, but I could be wrong. This isn’t a crux for me for the 80k board[1] but it is a crux for how much good could be done in the role.
Some additional bits of my model:
I doubt OpenAI is going to tell a dedicated safetyist they’re off the safety team and on direct capabilities. But the distinction is not always obvious, and employees will be very motivated to not fight OpenAI on marginal cases.
You know those people who stand too close, so you back away, and then they move closer? Your choices in that situation are to steel yourself for an intense battle, accept the distance they want, or leave. Employers can easily pull that off at scale. They make the question become “am I sure this will never be helpful to safety?” rather than “what is the expected safety value of this research?”
Alternate frame: How many times will an entry level engineer get to say no before he’s fired?
I have a friend who worked at OAI. They’d done all the right soul searching and concluded they were doing good alignment work. Then they quit, and a few months later were aghast at concerns they’d previous dismissed. Once you are in the situation is is very hard to maintain accurate perceptions.
Something @Buck said made me realize I was conflating “produce useful theoretical safety work” with “improve the safety of OpenAI’s products.” I don’t think OpenAI will stop production for safety reasons[2], but they might fund theoretical work that is useful to others, or that is cheap to follow themselves (perhaps because it boosts capabilities as well...).
Thank you. My internal experience is that my concerns stem from around x-risk (and belatedly the wage theft). But OpenAI has enough signs of harm and enough signs of hiding harm that I’m fine shit talking as a side effect, where normally I’d try for something more cooperative and with lines of retreat.
I think the clawbacks are disqualifying on their own, even if they had no safety implications. They stole money from employees! That’s one of the top 5 signs you’re in a bad workplace. 80k doesn’t even mention this.
to ballpark quantify: I think there is <5% chance that OpenAI slows production by 20% or more, in order to reduce AIXR. And I believe frontier AI companies need to be prepared to slow by more than that.