I’d define “genuine safety role” as “any qualified person will increase safety faster that capabilities in the role”. I put ~0 likelihood that OAI has such a position.
Which of the following claims are you making?
OpenAI doesn’t have any roles doing AI safety research aimed at reducing catastrophic risk from egregious AI misalignment; people who think they’re taking such a role will end up assigned to other tasks instead.
OpenAI does have roles where people do AI safety research aimed at reducing catastrophic risk from egregious AI misalignment, but all the research done by people in those roles sucks and the roles contribute to OpenAI having a good reputation, so taking those roles is net negative.
I find the first claim pretty implausible. E.g. I think that the recent SAE paper and the recent scalable oversight paper obviously count as an attempt at AI safety research. I think that people who take roles where they expect to work on research like that basically haven’t ended up unwillingly shifted to roles on e.g. safety systems, core capabilities research, or product stuff.
I’m not Elizabeth or Ray, but there’s a third option which I read the comment above to mean, and which I myself find plausible.
OpenAI does have roles that are obsessively aimed at reducing catastrophic risk from egregious AI misalignment. However, without more information, an outsider should not expect that those roles actually accelerate safety more than they accelerate capabilities.
Successfully increasing safety faster than capabilities requires that person to have a number of specific skills (eg political savvy, robustness to social pressure, a higher granularity strategic/technical model than most EAs have in practice, the etc.), over and above the skills that would be required to get hired for the role.
Lacking those skills, a hire for such a role is more likely to do harm than good, not primarily because they’ll be transitioned to other tasks, but because much of the work that the typical hire for such a role would end up doing either 1) doesn’t help or 2) will end up boosting OpenAI’s general capabilities more than it helps.
Furthermore, by working at OpenAI at all, they provide some legitimacy to the org as a whole, and to the existentially dangerous work happening in other parts of it, even if their work, does 0 direct harm. Someone working in such a role has to do sufficiently beneficial on-net work to overcome this baseline effect.
I’m not Elizabeth and probably wouldn’t have worded my thoughts quite the same, but my own position regarding your first bullet point is:
“When I see OpenAI list a ‘safety’ role, I’m like 55% confident that it has much to do with existential safety, and maybe 25% that it produces more existential safety than existential harm.”
When you say “when I see OpenAI list a ‘safety’ role”, are you talking about roles related to superalignment, or are you talking about all roles that have safety in the name? Obviously OpenAI has many roles that are aimed at various near-term safety stuff, and those might have safety in the name, but this isn’t duplicitous in the slightest—the job descriptions (and maybe even the rest of the job titles!) explain it perfectly clearly so it’s totally fine.
I assume you meant something like “when I see OpenAI list a role that seems to be focused on existential safety, I’m like 55% that it has much to do with existential safety”? In that case, I think your number is too low.
I was thinking of things like the Alignment Research Science role. If they talked up “this is a superalignment role”, I’d have an estimate higher than 55%.
We are seeking Researchers to help design and implement experiments for alignment research. Responsibilities may include:
Writing performant and clean code for ML training
Independently running and analyzing ML experiments to diagnose problems and understand which changes are real improvements
Writing clean non-ML code, for example when building interfaces to let workers interact with our models or pipelines for managing human data
Collaborating closely with a small team to balance the need for flexibility and iteration speed in research with the need for stability and reliability in a complex long-lived project
Understanding our high-level research roadmap to help plan and prioritize future experiments
Designing novel approaches for using LLMs in alignment research
Yeah, I think that this is disambiguated by the description of the team:
OpenAI’s Alignment Science research teams are working on technical approaches to ensure that AI systems reliably follow human intent even as their capabilities scale beyond human ability to directly supervise them.
We focus on researching alignment methods that scale and improve as AI capabilities grow. This is one component of several long-term alignment and safety research efforts at OpenAI, which we will provide more details about in the future.
So my guess is that you would call this an alignment role (except for the possibility that the team disappears because of superalignment-collapse-related drama).
Yeah I read those lines, and also “Want to use your engineering skills to push the frontiers of what state-of-the-art language models can accomplish”, and remain skeptical. I think the way OpenAI tends to equivocate on how they use the word “alignment” (or: they use it consistently, but, not in a way that I consider obviously good. Like, I the people working on RLHF a few years ago probably contributed to ChatGPT being released earlier which I think was bad*)
*I like the part where the world feels like it’s actually starting to respond to AI now, but, I think that would have happened later, with more serial-time for various other research to solidify.
(I think this is a broader difference in guesses about what research/approaches are good, which I’m not actually very confident about, esp. compared to habryka, but, is where I’m currently coming from)
*I like the part where the world feels like it’s actually starting to respond to AI now, but, I think that would have happened later, with more serial-time for various other research to solidify.
And with less serial-time for various policy plan to solidify and gain momentum.
If you think we’re irreparably far behind on the technical research, and advocacy / political action is relatively more promising, you might prefer to trade years of timeline for earlier, more widespread awareness of the importance of AI, and a longer relatively long period of people pushing on policy plans.
Good question. My revised belief is that OpenAI will not sufficiently slow down production in order to boost safety. It may still produce theoretical safety work that is useful to others, and to itself if the changes are cheap to implement.
I do also expect many people assigned to safety to end up doing more work on capabilities, because the distinction is not always obvious and they will have so many reasons to err in the direction of agreeing with their boss’s instructions.
Ok but I feel like if a job mostly involves research x-risk-motivated safety techniques and then publish them, it’s very reasonable to call it an x-risk-safety research job, regardless of how likely the organization where you work is to adopt your research eventually when it builds dangerous AI.
Which of the following claims are you making?
OpenAI doesn’t have any roles doing AI safety research aimed at reducing catastrophic risk from egregious AI misalignment; people who think they’re taking such a role will end up assigned to other tasks instead.
OpenAI does have roles where people do AI safety research aimed at reducing catastrophic risk from egregious AI misalignment, but all the research done by people in those roles sucks and the roles contribute to OpenAI having a good reputation, so taking those roles is net negative.
I find the first claim pretty implausible. E.g. I think that the recent SAE paper and the recent scalable oversight paper obviously count as an attempt at AI safety research. I think that people who take roles where they expect to work on research like that basically haven’t ended up unwillingly shifted to roles on e.g. safety systems, core capabilities research, or product stuff.
I’m not Elizabeth or Ray, but there’s a third option which I read the comment above to mean, and which I myself find plausible.
I’m not Elizabeth and probably wouldn’t have worded my thoughts quite the same, but my own position regarding your first bullet point is:
“When I see OpenAI list a ‘safety’ role, I’m like 55% confident that it has much to do with existential safety, and maybe 25% that it produces more existential safety than existential harm.”
When you say “when I see OpenAI list a ‘safety’ role”, are you talking about roles related to superalignment, or are you talking about all roles that have safety in the name? Obviously OpenAI has many roles that are aimed at various near-term safety stuff, and those might have safety in the name, but this isn’t duplicitous in the slightest—the job descriptions (and maybe even the rest of the job titles!) explain it perfectly clearly so it’s totally fine.
I assume you meant something like “when I see OpenAI list a role that seems to be focused on existential safety, I’m like 55% that it has much to do with existential safety”? In that case, I think your number is too low.
I was thinking of things like the Alignment Research Science role. If they talked up “this is a superalignment role”, I’d have an estimate higher than 55%.
Yeah, I think that this is disambiguated by the description of the team:
So my guess is that you would call this an alignment role (except for the possibility that the team disappears because of superalignment-collapse-related drama).
Yeah I read those lines, and also “Want to use your engineering skills to push the frontiers of what state-of-the-art language models can accomplish”, and remain skeptical. I think the way OpenAI tends to equivocate on how they use the word “alignment” (or: they use it consistently, but, not in a way that I consider obviously good. Like, I the people working on RLHF a few years ago probably contributed to ChatGPT being released earlier which I think was bad*)
*I like the part where the world feels like it’s actually starting to respond to AI now, but, I think that would have happened later, with more serial-time for various other research to solidify.
(I think this is a broader difference in guesses about what research/approaches are good, which I’m not actually very confident about, esp. compared to habryka, but, is where I’m currently coming from)
Tangent:
And with less serial-time for various policy plan to solidify and gain momentum.
If you think we’re irreparably far behind on the technical research, and advocacy / political action is relatively more promising, you might prefer to trade years of timeline for earlier, more widespread awareness of the importance of AI, and a longer relatively long period of people pushing on policy plans.
Good question. My revised belief is that OpenAI will not sufficiently slow down production in order to boost safety. It may still produce theoretical safety work that is useful to others, and to itself if the changes are cheap to implement.
I do also expect many people assigned to safety to end up doing more work on capabilities, because the distinction is not always obvious and they will have so many reasons to err in the direction of agreeing with their boss’s instructions.
Ok but I feel like if a job mostly involves research x-risk-motivated safety techniques and then publish them, it’s very reasonable to call it an x-risk-safety research job, regardless of how likely the organization where you work is to adopt your research eventually when it builds dangerous AI.