From Conor’s response on EAForum, it sounds like the answer is “we trust OpenAI to tell us”. In light of what we already know (safety team exodus, punitive and hidden NDAs, lack of disclosure to OpenAI’s governing board), that level of trust seems completely unjustified to me.
I would be shocked if OpenAI employees who took the role with that job description were pushed into doing capabilities research they didn’t want to do. (Obviously it’s plausible that they’d choose to do capabilities research while they were already there.)
Huh, this doesn’t super match my model. I have heard of people at OpenAI being pressured a lot into making sure their safety work helps with productization. I would be surprised if they end up being pressured working directly on the scaling team, but I wouldn’t end up surprised with someone being pressured into doing some better AI censorship in a way that doesn’t have any relevance to AI safety and does indeed make OpenAI a lot of money.
I wouldn’t end up surprised with someone being pressured into doing some better AI censorship in a way that doesn’t have any relevance to AI safety and does indeed make OpenAI a lot of money.
I disagree for the role advertised, I would be surprised by that. (I’d be less surprised if they advised on some post-training stuff that you’d think of as capabilities; I think that the “AI censorship” work is mostly done by a different team that doesn’t talk to the superalignment people that much. But idk where the superoversight people have been moved in the org, maybe they’d more naturally talk more now.)
Can you clarify what you mean by “completely unjustified”? For example, if OpenAI says “This role is a safety role.”, then in your opinion, what is the probability that the role is a genuine safety role?
I’d define “genuine safety role” as “any qualified person will increase safety faster that capabilities in the role”. I put ~0 likelihood that OAI has such a position. The best you could hope for is being a marginal support for a safety-based coup (which has already been attempted, and failed).
There’s a different question of “could a strategic person advance net safety by working at OpenAI, more so than any other option?”. I believe people like that exist, but they don’t need 80k to tell them about OpenAI.
I’d define “genuine safety role” as “any qualified person will increase safety faster that capabilities in the role”. I put ~0 likelihood that OAI has such a position.
Which of the following claims are you making?
OpenAI doesn’t have any roles doing AI safety research aimed at reducing catastrophic risk from egregious AI misalignment; people who think they’re taking such a role will end up assigned to other tasks instead.
OpenAI does have roles where people do AI safety research aimed at reducing catastrophic risk from egregious AI misalignment, but all the research done by people in those roles sucks and the roles contribute to OpenAI having a good reputation, so taking those roles is net negative.
I find the first claim pretty implausible. E.g. I think that the recent SAE paper and the recent scalable oversight paper obviously count as an attempt at AI safety research. I think that people who take roles where they expect to work on research like that basically haven’t ended up unwillingly shifted to roles on e.g. safety systems, core capabilities research, or product stuff.
I’m not Elizabeth or Ray, but there’s a third option which I read the comment above to mean, and which I myself find plausible.
OpenAI does have roles that are obsessively aimed at reducing catastrophic risk from egregious AI misalignment. However, without more information, an outsider should not expect that those roles actually accelerate safety more than they accelerate capabilities.
Successfully increasing safety faster than capabilities requires that person to have a number of specific skills (eg political savvy, robustness to social pressure, a higher granularity strategic/technical model than most EAs have in practice, the etc.), over and above the skills that would be required to get hired for the role.
Lacking those skills, a hire for such a role is more likely to do harm than good, not primarily because they’ll be transitioned to other tasks, but because much of the work that the typical hire for such a role would end up doing either 1) doesn’t help or 2) will end up boosting OpenAI’s general capabilities more than it helps.
Furthermore, by working at OpenAI at all, they provide some legitimacy to the org as a whole, and to the existentially dangerous work happening in other parts of it, even if their work, does 0 direct harm. Someone working in such a role has to do sufficiently beneficial on-net work to overcome this baseline effect.
I’m not Elizabeth and probably wouldn’t have worded my thoughts quite the same, but my own position regarding your first bullet point is:
“When I see OpenAI list a ‘safety’ role, I’m like 55% confident that it has much to do with existential safety, and maybe 25% that it produces more existential safety than existential harm.”
When you say “when I see OpenAI list a ‘safety’ role”, are you talking about roles related to superalignment, or are you talking about all roles that have safety in the name? Obviously OpenAI has many roles that are aimed at various near-term safety stuff, and those might have safety in the name, but this isn’t duplicitous in the slightest—the job descriptions (and maybe even the rest of the job titles!) explain it perfectly clearly so it’s totally fine.
I assume you meant something like “when I see OpenAI list a role that seems to be focused on existential safety, I’m like 55% that it has much to do with existential safety”? In that case, I think your number is too low.
I was thinking of things like the Alignment Research Science role. If they talked up “this is a superalignment role”, I’d have an estimate higher than 55%.
We are seeking Researchers to help design and implement experiments for alignment research. Responsibilities may include:
Writing performant and clean code for ML training
Independently running and analyzing ML experiments to diagnose problems and understand which changes are real improvements
Writing clean non-ML code, for example when building interfaces to let workers interact with our models or pipelines for managing human data
Collaborating closely with a small team to balance the need for flexibility and iteration speed in research with the need for stability and reliability in a complex long-lived project
Understanding our high-level research roadmap to help plan and prioritize future experiments
Designing novel approaches for using LLMs in alignment research
Yeah, I think that this is disambiguated by the description of the team:
OpenAI’s Alignment Science research teams are working on technical approaches to ensure that AI systems reliably follow human intent even as their capabilities scale beyond human ability to directly supervise them.
We focus on researching alignment methods that scale and improve as AI capabilities grow. This is one component of several long-term alignment and safety research efforts at OpenAI, which we will provide more details about in the future.
So my guess is that you would call this an alignment role (except for the possibility that the team disappears because of superalignment-collapse-related drama).
Yeah I read those lines, and also “Want to use your engineering skills to push the frontiers of what state-of-the-art language models can accomplish”, and remain skeptical. I think the way OpenAI tends to equivocate on how they use the word “alignment” (or: they use it consistently, but, not in a way that I consider obviously good. Like, I the people working on RLHF a few years ago probably contributed to ChatGPT being released earlier which I think was bad*)
*I like the part where the world feels like it’s actually starting to respond to AI now, but, I think that would have happened later, with more serial-time for various other research to solidify.
(I think this is a broader difference in guesses about what research/approaches are good, which I’m not actually very confident about, esp. compared to habryka, but, is where I’m currently coming from)
*I like the part where the world feels like it’s actually starting to respond to AI now, but, I think that would have happened later, with more serial-time for various other research to solidify.
And with less serial-time for various policy plan to solidify and gain momentum.
If you think we’re irreparably far behind on the technical research, and advocacy / political action is relatively more promising, you might prefer to trade years of timeline for earlier, more widespread awareness of the importance of AI, and a longer relatively long period of people pushing on policy plans.
Good question. My revised belief is that OpenAI will not sufficiently slow down production in order to boost safety. It may still produce theoretical safety work that is useful to others, and to itself if the changes are cheap to implement.
I do also expect many people assigned to safety to end up doing more work on capabilities, because the distinction is not always obvious and they will have so many reasons to err in the direction of agreeing with their boss’s instructions.
Ok but I feel like if a job mostly involves research x-risk-motivated safety techniques and then publish them, it’s very reasonable to call it an x-risk-safety research job, regardless of how likely the organization where you work is to adopt your research eventually when it builds dangerous AI.
I’d define “genuine safety role” as “any qualified person will increase safety faster that capabilities in the role”. I put ~0 likelihood that OAI has such a position. The best you could hope for is being a marginal support for a safety-based coup (which has already been attempted, and failed).
“~0 likelihood” means that you are nearly certain that OAI does not have such a position (ie, your usage of “likelihood” has the same meaning as “degree of certainty” or “strength of belief”)? I’m being pedantic because I’m not a probability expert and AFAIK “likelihood” has some technical usage in probability.
If you’re up for answering more questions like this, then how likely do you believe it is that OAI has a position where at least 90% of people who are both, (A) qualified skill wise (eg, ML and interpretability expert), and, (B) believes that AIXR is a serious problem, would increase safety faster than capabilities in that position?
There’s a different question of “could a strategic person advance net safety by working at OpenAI, more so than any other option?”. I believe people like that exist, but they don’t need 80k to tell them about OpenAI.
This is a good point and you mentioning it updates me towards believing that you are more motivated by (1) finding out what’s true regarding AIXR and (2) reducing AIXR, than something like (3) shit talking OAI.
how likely do you believe it is that OAI has a position where at least 90% of people who are both, (A) qualified skill wise (eg, ML and interpretability expert), and, (B) believes that AIXR is a serious problem, would increase safety faster than capabilities in that position?
The cheap answer here is 0, because I don’t think there is any position where that level of skill and belief in AIXR has a 90% chance of increasing net safety. Ability to do meaningful work in this field is rarer than that.
So the real question is how does OpenAI compare to other possibilities? To be specific, let’s say being an LTFF-funded solo researcher, academia, and working at Anthropic.
Working at OpenAI seems much more likely to boost capabilities than solo research and probably academia. Some of that is because they’re both less likely to do anything. But that’s because they face OOM less pressure to produce anything, which is an advantage in this case. LTFF is not a pressure- or fad-free zone, but they have nothing near the leverage of paying someone millions of dollars, or providing tens of hours each week surrounded by people who are also paid millions of dollars to believe they’re doing safe work.
I feel less certain about Anthropic. It doesn’t have any of terrible signs OpenAI did (like the repeated safety exoduses, the board coup, and clawbacks on employee equity), but we didn’t know about most of those a year ago.
If we’re talking about a generic skilled and concerned person, probably the most valuable thing they can do is support someone with good research vision. My impression is that these people are more abundant at Anthropic than OpenAI, especially after the latest exodus, but I could be wrong. This isn’t a crux for me for the 80k board[1] but it is a crux for how much good could be done in the role.
Some additional bits of my model:
I doubt OpenAI is going to tell a dedicated safetyist they’re off the safety team and on direct capabilities. But the distinction is not always obvious, and employees will be very motivated to not fight OpenAI on marginal cases.
You know those people who stand too close, so you back away, and then they move closer? Your choices in that situation are to steel yourself for an intense battle, accept the distance they want, or leave. Employers can easily pull that off at scale. They make the question become “am I sure this will never be helpful to safety?” rather than “what is the expected safety value of this research?”
Alternate frame: How many times will an entry level engineer get to say no before he’s fired?
I have a friend who worked at OAI. They’d done all the right soul searching and concluded they were doing good alignment work. Then they quit, and a few months later were aghast at concerns they’d previous dismissed. Once you are in the situation is is very hard to maintain accurate perceptions.
Something @Buck said made me realize I was conflating “produce useful theoretical safety work” with “improve the safety of OpenAI’s products.” I don’t think OpenAI will stop production for safety reasons[2], but they might fund theoretical work that is useful to others, or that is cheap to follow themselves (perhaps because it boosts capabilities as well...).
This is a good point and you mentioning it updates me towards believing that you are more motivated by (1) finding out what’s true regarding AIXR and (2) reducing AIXR, than something like (3) shit talking OAI.
Thank you. My internal experience is that my concerns stem from around x-risk (and belatedly the wage theft). But OpenAI has enough signs of harm and enough signs of hiding harm that I’m fine shit talking as a side effect, where normally I’d try for something more cooperative and with lines of retreat.
I think the clawbacks are disqualifying on their own, even if they had no safety implications. They stole money from employees! That’s one of the top 5 signs you’re in a bad workplace. 80k doesn’t even mention this.
to ballpark quantify: I think there is <5% chance that OpenAI slows production by 20% or more, in order to reduce AIXR. And I believe frontier AI companies need to be prepared to slow by more than that.
IMO “this role is a safety role” isn’t that strong evidence of the role involving research aimed at catastrophic AI risk, but the rest of the description of a particular role probably does provide pretty strong evidence.
Hm. Can I request tabooing the phrase “genuine safety role” in favor of more detailed description of the work that’s done? There’s broad disagreement about which kinds of research are (or should count as) “AI safety”, and what’s required for that to succeed.
From Conor’s response on EAForum, it sounds like the answer is “we trust OpenAI to tell us”. In light of what we already know (safety team exodus, punitive and hidden NDAs, lack of disclosure to OpenAI’s governing board), that level of trust seems completely unjustified to me.
I would be shocked if OpenAI employees who took the role with that job description were pushed into doing capabilities research they didn’t want to do. (Obviously it’s plausible that they’d choose to do capabilities research while they were already there.)
Huh, this doesn’t super match my model. I have heard of people at OpenAI being pressured a lot into making sure their safety work helps with productization. I would be surprised if they end up being pressured working directly on the scaling team, but I wouldn’t end up surprised with someone being pressured into doing some better AI censorship in a way that doesn’t have any relevance to AI safety and does indeed make OpenAI a lot of money.
I disagree for the role advertised, I would be surprised by that. (I’d be less surprised if they advised on some post-training stuff that you’d think of as capabilities; I think that the “AI censorship” work is mostly done by a different team that doesn’t talk to the superalignment people that much. But idk where the superoversight people have been moved in the org, maybe they’d more naturally talk more now.)
Can you clarify what you mean by “completely unjustified”? For example, if OpenAI says “This role is a safety role.”, then in your opinion, what is the probability that the role is a genuine safety role?
I’d define “genuine safety role” as “any qualified person will increase safety faster that capabilities in the role”. I put ~0 likelihood that OAI has such a position. The best you could hope for is being a marginal support for a safety-based coup (which has already been attempted, and failed).
There’s a different question of “could a strategic person advance net safety by working at OpenAI, more so than any other option?”. I believe people like that exist, but they don’t need 80k to tell them about OpenAI.
Which of the following claims are you making?
OpenAI doesn’t have any roles doing AI safety research aimed at reducing catastrophic risk from egregious AI misalignment; people who think they’re taking such a role will end up assigned to other tasks instead.
OpenAI does have roles where people do AI safety research aimed at reducing catastrophic risk from egregious AI misalignment, but all the research done by people in those roles sucks and the roles contribute to OpenAI having a good reputation, so taking those roles is net negative.
I find the first claim pretty implausible. E.g. I think that the recent SAE paper and the recent scalable oversight paper obviously count as an attempt at AI safety research. I think that people who take roles where they expect to work on research like that basically haven’t ended up unwillingly shifted to roles on e.g. safety systems, core capabilities research, or product stuff.
I’m not Elizabeth or Ray, but there’s a third option which I read the comment above to mean, and which I myself find plausible.
I’m not Elizabeth and probably wouldn’t have worded my thoughts quite the same, but my own position regarding your first bullet point is:
“When I see OpenAI list a ‘safety’ role, I’m like 55% confident that it has much to do with existential safety, and maybe 25% that it produces more existential safety than existential harm.”
When you say “when I see OpenAI list a ‘safety’ role”, are you talking about roles related to superalignment, or are you talking about all roles that have safety in the name? Obviously OpenAI has many roles that are aimed at various near-term safety stuff, and those might have safety in the name, but this isn’t duplicitous in the slightest—the job descriptions (and maybe even the rest of the job titles!) explain it perfectly clearly so it’s totally fine.
I assume you meant something like “when I see OpenAI list a role that seems to be focused on existential safety, I’m like 55% that it has much to do with existential safety”? In that case, I think your number is too low.
I was thinking of things like the Alignment Research Science role. If they talked up “this is a superalignment role”, I’d have an estimate higher than 55%.
Yeah, I think that this is disambiguated by the description of the team:
So my guess is that you would call this an alignment role (except for the possibility that the team disappears because of superalignment-collapse-related drama).
Yeah I read those lines, and also “Want to use your engineering skills to push the frontiers of what state-of-the-art language models can accomplish”, and remain skeptical. I think the way OpenAI tends to equivocate on how they use the word “alignment” (or: they use it consistently, but, not in a way that I consider obviously good. Like, I the people working on RLHF a few years ago probably contributed to ChatGPT being released earlier which I think was bad*)
*I like the part where the world feels like it’s actually starting to respond to AI now, but, I think that would have happened later, with more serial-time for various other research to solidify.
(I think this is a broader difference in guesses about what research/approaches are good, which I’m not actually very confident about, esp. compared to habryka, but, is where I’m currently coming from)
Tangent:
And with less serial-time for various policy plan to solidify and gain momentum.
If you think we’re irreparably far behind on the technical research, and advocacy / political action is relatively more promising, you might prefer to trade years of timeline for earlier, more widespread awareness of the importance of AI, and a longer relatively long period of people pushing on policy plans.
Good question. My revised belief is that OpenAI will not sufficiently slow down production in order to boost safety. It may still produce theoretical safety work that is useful to others, and to itself if the changes are cheap to implement.
I do also expect many people assigned to safety to end up doing more work on capabilities, because the distinction is not always obvious and they will have so many reasons to err in the direction of agreeing with their boss’s instructions.
Ok but I feel like if a job mostly involves research x-risk-motivated safety techniques and then publish them, it’s very reasonable to call it an x-risk-safety research job, regardless of how likely the organization where you work is to adopt your research eventually when it builds dangerous AI.
“~0 likelihood” means that you are nearly certain that OAI does not have such a position (ie, your usage of “likelihood” has the same meaning as “degree of certainty” or “strength of belief”)? I’m being pedantic because I’m not a probability expert and AFAIK “likelihood” has some technical usage in probability.
If you’re up for answering more questions like this, then how likely do you believe it is that OAI has a position where at least 90% of people who are both, (A) qualified skill wise (eg, ML and interpretability expert), and, (B) believes that AIXR is a serious problem, would increase safety faster than capabilities in that position?
This is a good point and you mentioning it updates me towards believing that you are more motivated by (1) finding out what’s true regarding AIXR and (2) reducing AIXR, than something like (3) shit talking OAI.
I asked a related question a few months ago, ie, if one becomes doom pilled while working as an executive at an AI lab and one strongly values survival, what should one do?
The cheap answer here is 0, because I don’t think there is any position where that level of skill and belief in AIXR has a 90% chance of increasing net safety. Ability to do meaningful work in this field is rarer than that.
So the real question is how does OpenAI compare to other possibilities? To be specific, let’s say being an LTFF-funded solo researcher, academia, and working at Anthropic.
Working at OpenAI seems much more likely to boost capabilities than solo research and probably academia. Some of that is because they’re both less likely to do anything. But that’s because they face OOM less pressure to produce anything, which is an advantage in this case. LTFF is not a pressure- or fad-free zone, but they have nothing near the leverage of paying someone millions of dollars, or providing tens of hours each week surrounded by people who are also paid millions of dollars to believe they’re doing safe work.
I feel less certain about Anthropic. It doesn’t have any of terrible signs OpenAI did (like the repeated safety exoduses, the board coup, and clawbacks on employee equity), but we didn’t know about most of those a year ago.
If we’re talking about a generic skilled and concerned person, probably the most valuable thing they can do is support someone with good research vision. My impression is that these people are more abundant at Anthropic than OpenAI, especially after the latest exodus, but I could be wrong. This isn’t a crux for me for the 80k board[1] but it is a crux for how much good could be done in the role.
Some additional bits of my model:
I doubt OpenAI is going to tell a dedicated safetyist they’re off the safety team and on direct capabilities. But the distinction is not always obvious, and employees will be very motivated to not fight OpenAI on marginal cases.
You know those people who stand too close, so you back away, and then they move closer? Your choices in that situation are to steel yourself for an intense battle, accept the distance they want, or leave. Employers can easily pull that off at scale. They make the question become “am I sure this will never be helpful to safety?” rather than “what is the expected safety value of this research?”
Alternate frame: How many times will an entry level engineer get to say no before he’s fired?
I have a friend who worked at OAI. They’d done all the right soul searching and concluded they were doing good alignment work. Then they quit, and a few months later were aghast at concerns they’d previous dismissed. Once you are in the situation is is very hard to maintain accurate perceptions.
Something @Buck said made me realize I was conflating “produce useful theoretical safety work” with “improve the safety of OpenAI’s products.” I don’t think OpenAI will stop production for safety reasons[2], but they might fund theoretical work that is useful to others, or that is cheap to follow themselves (perhaps because it boosts capabilities as well...).
Thank you. My internal experience is that my concerns stem from around x-risk (and belatedly the wage theft). But OpenAI has enough signs of harm and enough signs of hiding harm that I’m fine shit talking as a side effect, where normally I’d try for something more cooperative and with lines of retreat.
I think the clawbacks are disqualifying on their own, even if they had no safety implications. They stole money from employees! That’s one of the top 5 signs you’re in a bad workplace. 80k doesn’t even mention this.
to ballpark quantify: I think there is <5% chance that OpenAI slows production by 20% or more, in order to reduce AIXR. And I believe frontier AI companies need to be prepared to slow by more than that.
IMO “this role is a safety role” isn’t that strong evidence of the role involving research aimed at catastrophic AI risk, but the rest of the description of a particular role probably does provide pretty strong evidence.
Hm. Can I request tabooing the phrase “genuine safety role” in favor of more detailed description of the work that’s done? There’s broad disagreement about which kinds of research are (or should count as) “AI safety”, and what’s required for that to succeed.
I suspect that would provide some value, but did you mean to respond to @Elizabeth?
I was just trying to use the term as a synonym for “actual safety role” as @Elizabeth used it in her original comment.
This part of your comment seems accurate to me, but I’m not a domain expert.