AI safety researchers might be allocated too heavily to Anthropic compared to Google Deepmind
Some considerations:
Safety researchers should want Google Deepmind (GDM) to have a robust and flourishing safety department. It seems plausible that GDM will be able to create “the smartest” models: they have lots of talent, and own lots of computers. (see e.g. https://epochai.org/data/notable-ai-models#computing-capacity)
Anthropic (ANT) might run into trouble in the future due to not owning their own computers, e.g. if Amazon (or where ever they’re renting their computers from) starts their own internal scaling competitor, and decides to stop renting out most of their compute.
ANT has a stronger safety culture, and so it is a more pleasant experience to work at ANT for the average safety researcher. This suggests that there might be a systematic bias towards ANT that pulls away from the “optimal allocation”.
GDM only recently started a bay area based safety research team/lab (with members like Alex Turner). So if people had previously decided to work for ANT based on location, they now have the opportunity to work for GDM without relocating.
I’ve heard that many safety researchers join ANT without considering working for GDM, which seems like an error, although I don’t have 1st hand evidence for this being true.
ANT vs GDM is probably a less important consideration than “scaling lab” (ANT, OAI, GMD, XAI, etc.) vs “non scaling lab” (USAISI, UKAISI, Redwood, ARC, Palisade, METR, MATS, etc. (so many...)). I would advise people to think hard about how joining a scaling lab might inhibit their future careers by e.g. creating a perception they are “corrupted” [edit: I mean viewed as corrupted by the broader world in situations where e.g. there is a non-existential AI disaster or there is rising dislike of the way AI is being handled by coorperations more broadly, e.g. similar to how working for an oil company might result in various climate people thinking you’re corrupted, even if you were trying to get the oil company to reduce emissions, etc. I personally do not think GDM or ANT safety people are “corrupted”] (in addition to strengthening them, which I expect people to spend more time thinking about by default).
Because ANT has a stronger safety culture, doing safety at GDM involve more politics and navigating around buerearcracy, and thus might be less productive. This consideration applies most if you think the impact of your work is mostly through the object level research you do, which I think is possible but not that plausible.
(Thanks to Neel Nanda for inspiring this post, and Ryan Greenblatt for comments.)
ANT has a stronger safety culture, and so it is a more pleasant experience to work at ANT for the average safety researcher. This suggests that there might be a systematic bias towards ANT that pulls away from the “optimal allocation”.
I think this depends on whether you think AI safety at a lab is more of an O-ring process or a swiss-cheese process. Also, if you think it’s more of an O-ring process, you might be generally less excited about working at a scaling lab.
Centralization might actually be good if you believe there are compounding returns to having lots of really strong safety researchers in one spot working together, e.g. in terms of having other really good people to work with, learn from, and give you feedback.
My guess would be that Anthropic resources its safety teams substantially more than GDM in terms of e.g. compute per researcher (though I’m not positive of this).
I think the object-level research productivity concerns probably dominate, but if you’re thinking about influence instead, it’s still not clear to me that GDM is better. GDM is a much larger, more bureaucratic organization, which makes it a lot harder to influence. So influencing Anthropic might just be much more tractable.
is it actually tractable to affect Deepmind’s culture and organizational decisionmaking
how close to the threshold is Anthropic for having a good enough safety culture?
My current best guess is that Anthropic is still under the threshold for good enough safety culture (despite seeming better than I expected in a number of ways), and meanwhile that Deepmind is just too intractably far gone.
I think people should be hesitant to work at any scaling lab, but, I think Anthropic might be possible to make “the one actually good scaling lab”, and I don’t currently expect that to be tractable at Deepmind and I think “having at least one” seems good for the world (although it’s a bit hard for me to articulate why at the moment)
I am interested in hearing details about Deepmind that anyone thinks should change my mind about this.
This viewpoint is based on having spent at least 10s of hours trying to learn and about influence both org’s culture, at various times.
In both cases, I don’t get the sense that people at the orgs really have a visceral sense that “decisionmaking processes can be fake”, I think they will be fake by default and the org is better modeled as following general incentives, and DeepMind has too many moving people and moving parts at a low enough density that it doesn’t seem possible to fix. For me to change my mind about this, I would need to someone there to look me in the eye and explain that they do have a visceral sense of how organizational decisionmaking processes can be fake, and why they nonetheless think DeepMind is tractable to fix. I assume it’s hard for @Rohin Shah and @Neel Nanda can’t really say anything publicly that’s capable of changing my mind for various confidentiality and political reasons, but, like, that’s my crux.
(conving me in more general terms “Ray, you’re too pessimistic about org culture” would hypothetically somehow work, but, you have a lot of work to do given how thoroughly those pessimistic predictions came true about OpenAi)
I think Anthropic also has this problem, but the threshold of almost-aligned-leadership and actually-pretty-aligned people that it feels at least possible to me for the to fix it. The main things that would persuade me that they are over the critical threshold is if they publicly spent social capital on clearly spelling out why the x-risk problem is hard, and made explicit plans to not merely pause for a bit when they hit an RSP threshold, but (at least in some circumstances) advocate strongly for global government shutdown for like 20+ years.
I think your pessimism of org culture is pretty relevant for the question of big decisions that GDM may make, but I think there is absolutely still a case to be made for the value of alignment research conducted wherever. If the research ends up published, then the origin shouldn’t be held too much against it.
So yes, having a few more researchers at GDM doesn’t solve the corporate race problem, but I don’t think it worsens it either.
It might be “fine” to do research at GDM (depending on how free you are to actually pursue good research directions, or how good a mentor you have). But, part of the schema in Mark’s post is “where should one go for actively good second-order effects?”.
I largely agree with this take & also think that people often aren’t aware of some of GDM’s bright spots from a safety perspective. My guess is that most people overestimate the degree to which ANT>GDM from a safety perspective.
For example, I think GDM has been thinking more about international coordination than ANT. Demis has said that he supports a “CERN for AI” model, and GDM’s governance team (led by Allan Dafoe) has written a few pieces about international coordination proposals.
ANT has said very little about international coordination. It’s much harder to get a sense of where ANT’s policy team is at. My guess is that they are less enthusiastic about international coordination relative to GDM and more enthusiastic about things like RSPs, safety cases, and letting scaling labs continue unless/until there is clearer empirical evidence of loss of control risks.
I also think GDM deserves some praise for engaging publicly with arguments about AGI ruin and threat models.
(On the other hand, GDM is ultimately controlled by Google, which makes it unclear how important Demis’s opinions or Allan’s work will be. Also, my impression is that Google was neutral or against SB1047, whereas ANT eventually said that the benefits outweighed the costs.)
Great post. I’m on GDM’s new AI safety and alignment team in the Bay Area and hope readers will consider joining us!
I would advise people to think hard about how joining a scaling lab might inhibit their future careers by e.g. creating a perception they are “corrupted”
What evidence is there that working at a scaling lab risks creating a “corrupted” perception? When I try thinking of examples, the people that come to my mind seem to have quite successfully transitioned from working at a scaling lab to doing nonprofit / government work. For example:
Paul Christiano went from OpenAI to the nonprofit Alignment Research Center (ARC) to head of AI safety at the US AI Safety Institute.
Geoffrey Irving worked at Google Brain, OpenAI, and Google DeepMind. Geoffrey is now Chief Scientist at the UK AI Safety Institute.
Beth Barnes worked at DeepMind and OpenAI and is now founder and head of research at Model Evaluation and Threat Research (METR).
I was intending to warn about the possibility of future perception of corruption, e.g. after a non-existential AI catastrophe. I do not think anyone currently working at safety teams is percieved as that “corrupted”, although I do think there is mild negative sentiment among some online communities (some parts of twitter, reddit, etc.).
> think hard about how joining a scaling lab might inhibit their future careers by e.g. creating a perception they are “corrupted”
Does this mean something like:
1. People who join scaling labs can have their values drift, and future safety employers will suspect by-default that ex-scaling lab staff have had their values drift, or
2. If there is a non-existential AGI disaster, scaling lab staff will be looked down upon
Basically (2), very small amounts of (1) (perhaps qualitatively similar to the amount of (1) you would apply to e.g. people joining US AISI or UK AISI)
The high level claim seems pretty true to me. Come to the GDM alignment team, it’s great over here! It seems quite important to me that all AGI labs have good safety teams
AI safety researchers might be allocated too heavily to Anthropic compared to Google Deepmind
Some considerations:
Safety researchers should want Google Deepmind (GDM) to have a robust and flourishing safety department. It seems plausible that GDM will be able to create “the smartest” models: they have lots of talent, and own lots of computers. (see e.g. https://epochai.org/data/notable-ai-models#computing-capacity)
Anthropic (ANT) might run into trouble in the future due to not owning their own computers, e.g. if Amazon (or where ever they’re renting their computers from) starts their own internal scaling competitor, and decides to stop renting out most of their compute.
ANT has a stronger safety culture, and so it is a more pleasant experience to work at ANT for the average safety researcher. This suggests that there might be a systematic bias towards ANT that pulls away from the “optimal allocation”.
GDM only recently started a bay area based safety research team/lab (with members like Alex Turner). So if people had previously decided to work for ANT based on location, they now have the opportunity to work for GDM without relocating.
I’ve heard that many safety researchers join ANT without considering working for GDM, which seems like an error, although I don’t have 1st hand evidence for this being true.
ANT vs GDM is probably a less important consideration than “scaling lab” (ANT, OAI, GMD, XAI, etc.) vs “non scaling lab” (USAISI, UKAISI, Redwood, ARC, Palisade, METR, MATS, etc. (so many...)). I would advise people to think hard about how joining a scaling lab might inhibit their future careers by e.g. creating a perception they are “corrupted” [edit: I mean viewed as corrupted by the broader world in situations where e.g. there is a non-existential AI disaster or there is rising dislike of the way AI is being handled by coorperations more broadly, e.g. similar to how working for an oil company might result in various climate people thinking you’re corrupted, even if you were trying to get the oil company to reduce emissions, etc. I personally do not think GDM or ANT safety people are “corrupted”] (in addition to strengthening them, which I expect people to spend more time thinking about by default).
Because ANT has a stronger safety culture, doing safety at GDM involve more politics and navigating around buerearcracy, and thus might be less productive. This consideration applies most if you think the impact of your work is mostly through the object level research you do, which I think is possible but not that plausible.
(Thanks to Neel Nanda for inspiring this post, and Ryan Greenblatt for comments.)
I think this depends on whether you think AI safety at a lab is more of an O-ring process or a swiss-cheese process. Also, if you think it’s more of an O-ring process, you might be generally less excited about working at a scaling lab.
Some possible counterpoints:
Centralization might actually be good if you believe there are compounding returns to having lots of really strong safety researchers in one spot working together, e.g. in terms of having other really good people to work with, learn from, and give you feedback.
My guess would be that Anthropic resources its safety teams substantially more than GDM in terms of e.g. compute per researcher (though I’m not positive of this).
I think the object-level research productivity concerns probably dominate, but if you’re thinking about influence instead, it’s still not clear to me that GDM is better. GDM is a much larger, more bureaucratic organization, which makes it a lot harder to influence. So influencing Anthropic might just be much more tractable.
I think two major cruxes for me here are:
is it actually tractable to affect Deepmind’s culture and organizational decisionmaking
how close to the threshold is Anthropic for having a good enough safety culture?
My current best guess is that Anthropic is still under the threshold for good enough safety culture (despite seeming better than I expected in a number of ways), and meanwhile that Deepmind is just too intractably far gone.
I think people should be hesitant to work at any scaling lab, but, I think Anthropic might be possible to make “the one actually good scaling lab”, and I don’t currently expect that to be tractable at Deepmind and I think “having at least one” seems good for the world (although it’s a bit hard for me to articulate why at the moment)
I am interested in hearing details about Deepmind that anyone thinks should change my mind about this.
This viewpoint is based on having spent at least 10s of hours trying to learn and about influence both org’s culture, at various times.
In both cases, I don’t get the sense that people at the orgs really have a visceral sense that “decisionmaking processes can be fake”, I think they will be fake by default and the org is better modeled as following general incentives, and DeepMind has too many moving people and moving parts at a low enough density that it doesn’t seem possible to fix. For me to change my mind about this, I would need to someone there to look me in the eye and explain that they do have a visceral sense of how organizational decisionmaking processes can be fake, and why they nonetheless think DeepMind is tractable to fix. I assume it’s hard for @Rohin Shah and @Neel Nanda can’t really say anything publicly that’s capable of changing my mind for various confidentiality and political reasons, but, like, that’s my crux.
(conving me in more general terms “Ray, you’re too pessimistic about org culture” would hypothetically somehow work, but, you have a lot of work to do given how thoroughly those pessimistic predictions came true about OpenAi)
I think Anthropic also has this problem, but the threshold of almost-aligned-leadership and actually-pretty-aligned people that it feels at least possible to me for the to fix it. The main things that would persuade me that they are over the critical threshold is if they publicly spent social capital on clearly spelling out why the x-risk problem is hard, and made explicit plans to not merely pause for a bit when they hit an RSP threshold, but (at least in some circumstances) advocate strongly for global government shutdown for like 20+ years.
I think your pessimism of org culture is pretty relevant for the question of big decisions that GDM may make, but I think there is absolutely still a case to be made for the value of alignment research conducted wherever. If the research ends up published, then the origin shouldn’t be held too much against it.
So yes, having a few more researchers at GDM doesn’t solve the corporate race problem, but I don’t think it worsens it either.
As for pausing, I think it’s a terrible idea. I’m pretty confident that any sort of large scale pause would be compute threshold focused, and would be worse than not pausing because it would shift research pressure towards algorithmic efficiency. More on that here: https://www.lesswrong.com/posts/Kobbt3nQgv3yn29pr/my-theory-of-change-for-working-in-ai-healthtech?commentId=qwixG4xYeFdELb2GJ
It might be “fine” to do research at GDM (depending on how free you are to actually pursue good research directions, or how good a mentor you have). But, part of the schema in Mark’s post is “where should one go for actively good second-order effects?”.
I largely agree with this take & also think that people often aren’t aware of some of GDM’s bright spots from a safety perspective. My guess is that most people overestimate the degree to which ANT>GDM from a safety perspective.
For example, I think GDM has been thinking more about international coordination than ANT. Demis has said that he supports a “CERN for AI” model, and GDM’s governance team (led by Allan Dafoe) has written a few pieces about international coordination proposals.
ANT has said very little about international coordination. It’s much harder to get a sense of where ANT’s policy team is at. My guess is that they are less enthusiastic about international coordination relative to GDM and more enthusiastic about things like RSPs, safety cases, and letting scaling labs continue unless/until there is clearer empirical evidence of loss of control risks.
I also think GDM deserves some praise for engaging publicly with arguments about AGI ruin and threat models.
(On the other hand, GDM is ultimately controlled by Google, which makes it unclear how important Demis’s opinions or Allan’s work will be. Also, my impression is that Google was neutral or against SB1047, whereas ANT eventually said that the benefits outweighed the costs.)
Great post. I’m on GDM’s new AI safety and alignment team in the Bay Area and hope readers will consider joining us!
What evidence is there that working at a scaling lab risks creating a “corrupted” perception? When I try thinking of examples, the people that come to my mind seem to have quite successfully transitioned from working at a scaling lab to doing nonprofit / government work. For example:
Paul Christiano went from OpenAI to the nonprofit Alignment Research Center (ARC) to head of AI safety at the US AI Safety Institute.
Geoffrey Irving worked at Google Brain, OpenAI, and Google DeepMind. Geoffrey is now Chief Scientist at the UK AI Safety Institute.
Beth Barnes worked at DeepMind and OpenAI and is now founder and head of research at Model Evaluation and Threat Research (METR).
I was intending to warn about the possibility of future perception of corruption, e.g. after a non-existential AI catastrophe. I do not think anyone currently working at safety teams is percieved as that “corrupted”, although I do think there is mild negative sentiment among some online communities (some parts of twitter, reddit, etc.).
> think hard about how joining a scaling lab might inhibit their future careers by e.g. creating a perception they are “corrupted”
Does this mean something like:
1. People who join scaling labs can have their values drift, and future safety employers will suspect by-default that ex-scaling lab staff have had their values drift, or
2. If there is a non-existential AGI disaster, scaling lab staff will be looked down upon
or something else entirely?
Basically (2), very small amounts of (1) (perhaps qualitatively similar to the amount of (1) you would apply to e.g. people joining US AISI or UK AISI)
The high level claim seems pretty true to me. Come to the GDM alignment team, it’s great over here! It seems quite important to me that all AGI labs have good safety teams
Thanks for writing the post!