The only criticism of you and your team in the OP is that you named your team the “Center” for AI Safety, as though you had much history leading safety efforts or had a ton of buy-in from the rest of the field. I don’t believe that either of these are true[1], it seems to me that the name preceded you engaging in major safety efforts. This power-grab for being the “Center” of the field was a step toward putting you in a position to be publicly interviewed and on advisory boards like this and coordinate the signing of a key statement[2]. It is a symmetric tool for gaining power, and doesn’t give me any evidence about whether you having this power is a good or bad thing.
There is also a thread of discussion that I would say hinges on the question of “whether you being an advisor is a substantive change to the orgs’ direction or safety-washing”. I don’t really know, and this is why I don’t consider “By default there would likely be no advising” a sufficient argument to show that this is a clearly good deed and not a bad or neutral deed.
For instance, I would be interested to know whether your org name was run by people such as Katja Grace, Paul Christiano, Vika Krakovna, Jan Leike, Scott Garrabrant, Evan Hubinger, or Stuart Armstrong, all people who have historically made a number of substantive contributions to the field. I currently anticipate that you did not get feedback from most of these people on the team name. I would be pleased to have my beliefs falsified on this question.
Dan spent his entire PhD working on AI safety and did some of the most influential work on OOD robustness and OOD detection, as well as writing Unsolved Problems. Even if this work is less valued by some readers on LessWrong (imo mistakenly), it seems pretty inaccurate to say that he didn’t work on safety before founding CAIS.
The only criticism of you and your team in the OP is that you named your team the “Center” for AI Safety, as though you had much history leading safety efforts or had a ton of buy-in from the rest of the field.
Fwiw, I disagree that “center” carries these connotations. To me it’s more like “place where some activity of a certain kind is carried out”, or even just a synonym of “institute”. (I feel the same about the other 5-10 EA-ish “centers/centres” focused on AI x-risk-reduction.) I guess I view these things more as “a center of X” than “the center of X”. Maybe I’m in the minority on this but I’d be kind of surprised if that were the case.
Interesting. Good point about there being other examples. I’ll list some of them that I can find with a quick search, and share my impressions of whether the names are good/bad/neutral.
UC Berkeley’s Center for Human-Compatible AI
Paul Christiano’s Alignment Research Center
Center for Long-Term Risk
Center for Long-Term Resilience
The first feels pretty okay to me because (a) Stuart Russell has a ton of leadership points to spend in the field of AI, given he wrote the standard textbook (prior to deep learning), and (b) this is within the academic system, where I expect the negotiation norms for naming have been figured out many decades ago.
I thought about the second one at the time. It feels slightly like it’s taking central namespace, but I think Paul Christiano is on a shortlist of people who can reasonably take the mantle as foremost alignment researcher, so I am on net supportive of him having this name (and I expect many others in the field are perfectly fine with it too). I also mostly do not expect Paul to use the name for that much politicking, who in many ways seems to try to do relatively inoffensive things.
I don’t have many associations with the third and fourth, and don’t see them as picking up much political capital from the names. Insofar as they’re putting themselves as representative of the ‘longtermism’ flag, I don’t particularly feel connected to that flag and am not personally interested in policing its use. And otherwise, if I made a non-profit called (for example) “The Center for Mathematical Optimization Methods” I think that possibly one or two people would be annoyed if they didn’t like my work, but mostly I don’t think there’s a substantial professional network or field of people that I’d be implicitly representing, and those who knew about my work would be glad that anyone was making an effort.
I’ll repeat that one of my impressions here is that Dan is picking up a lot of social and political capital that others have earned by picking “Center for AI Safety”, a field he didn’t build, hadn’t led before, and is aggressively spending the political capital of in ways that most of the people who did build the field can’t check and don’t know whether they endorse. (With the exception that I believe the signed statement would be endorsed by most all people in the field and I’m glad it was executed successfully.)
(As a related gripe, Dan and his team also moved to overwrite the namespace that Drexler of the FHI occupied, in a way that seemed uncaring/disrespectful to me.)
I think your analysis makes sense if using a “center” name really should require you to have some amount of eminence or credibility first. I’ve updated a little bit in that direction now, but I still mostly think it’s just synonymous with “institute”, and on that view I don’t care if someone takes a “center” name (any more than if someone takes an “institute” name). It’s just, you know, one of the five or so nouns non-profits and think tanks use in their names (“center”, “institute”, “foundation”, “organization”, “council”, blah).
Or actually, maybe it’s more like I’m less convinced that there’s a common pool of social/political capital that CAIS is now spending from. I think the signed statement has resulted in other AI gov actors now having higher chances of getting things done. I think if the statement had been not very successful, it wouldn’t have harmed those actors’ ability to get things done. (Maybe if it was really botched it would’ve, but then my issue would’ve been with CAIS’s botching the statement, not with their name.)
I guess I also don’t really buy that using “center” spends from this pool (to the extent that there is a pool). What’s the scarce resource it’s using? Policy-makers’ time/attention? Regular people’s time/attention? Or do people only have a fixed amount of respect or credibility to accord various AI safety orgs? I doubt, for example, that other orgs lost out on opportunities to influence people, or inform policy-makers, due to CAIS’s actions. I guess what I’m trying to say is I’m a bit confused about your model!
Btw, in case it matters, the other examples I had in mind were Center for Security and Emerging Technology (CSET) and Centre for the Governance of AI (GovAI).
I think your analysis makes sense if using a “center” name really should require you to have some amount of eminence or credibility first… I still mostly think it’s just synonymous with “institute”
I’m finding it hard to explain why I think that naming yourself “The Center for AI Safety” is taking ownership of a prime piece of namespace real estate and also positioning yourselves as representing the field moreso than if you call yourself “Conjecture” or “Redwood Research” or “Anthropic”.
Like, consider an outsider trying to figure out who to go to in this field to help out or ask questions. They will be more likely to figure out that “The Center for AI Safety” is a natural place to go to rather than “the Future of Humanity Institute”, even though the second one has done a lot more relevant research.
There are basically no other organizations with “AI Safety” in the name, even though when Dan picked the name the others had all done more work to build up the field (e.g. FLI’s puerto rico conference, FHI’s book on Superintelligence, MIRI’s defining the field a decade ago, more).
[Added: I think names should be accurate on Simulacra Level 1 and not Simulacra Level 3-4. For instance, if lots of small and incompetent organizations call themselves “Center” and “Institute” when they aren’t much of a center of anything and aren’t long-lasting institutions, this is bad for our ability to communicate and it isn’t okay even if a lot of people do it. “Organization” seems like a pretty low bar and means something less substantive. “Council” seems odd here and only counts if you are actually primarily a council to other institutions.]
Or actually, maybe it’s more like I’m less convinced that there’s a common pool of social/political capital that CAIS is now spending from.
I think it’s pretty clear that there is. Of course “the AI Safety field” has a shared reputation and social capital. People under the same flag all are taking and investing resources into that flag. Scandals in some parishes of the Catholic Church affect the reputation of other Catholic parishes (and other Christians broadly). The reputation of YC affects the reputation of Silicon Valley and startups more broadly. When I watch great talks by Patrick Collison I more want to work in startups and think highly of them, when I see Sam Bankman-Fried steal billions of dollars I think that in general crypto orgs are more likely to be frauds.
I agree that the signed sentence gave more capital back to others than they have spent. Hendrycks’ team are not in-debt in my book. I nonetheless want to name costs and be open about criticism, and also say I am a bit nervous about what they will do in the relationship with xAI (e.g. it feels a bit like they might give xAI a tacit endorsement from the AI Safety community, which could easily be exceedingly costly).
What’s the scarce resource it’s using?
For one, in an incognito browser, their website is the first google result for the search term “AI Safety”, before even the wikipedia page.
What were the other options? Have you considered advising xAI privately, or re-directing xAI to be advised by someone else? Also, would the default be clearly worse?
As you surely are quite aware of, one of the bigger fights about AI safety across academia, policymaking and public spaces now is the discussion about AI safety being “distraction” from immediate social harms, and being actually the agenda favoured by the leading labs and technologists. (Often comes with accusations of attempted regulatory capture, worries about concentration of power, etc.)
In my view, given this situation, it seems valuable to have AI safety represented also by somewhat neutral coordination institutions without obvious conflicts of interest and large attack surfaces.
As I wrote in the OP, CAIS made some relatively bold moves to became one of the most visible “public representatives” of AI safety—including the name choice, and organizing the widely reported Statement on AI risk (which was a success). Until now, my impression was when you are taking the namespace, you also aim for CAIS to be such “somewhat neutral coordination institution without obvious conflicts of interest and large attack surfaces”.
Maybe I was wrong, and you don’t aim for this coordination/representative role. But if you do, advising xAI seems a strange choice for multiple reasons: 1. it makes you somewhat less neutral party for the broader world; even if the link to xAI does not actually influence your judgement or motivations, I think on priors it’s broadly sensible for policymakers, politicians and public to suspect all kind of activism, advocacy and lobbying efforts having some side-motivations or conflicts of interest, and this strengthens this suspicion 2. the existing public announcements do not inspire confidence in the safety mindset in xAI founders; it seems unclear whether you advised xAI also about the plan “align to curiosity” 3. if xAI turns to be mostly interested in safety-washing, it’s more of problem if it’s aided by more central/representative org
It is unclear to me on whether having your name publicly associated with them is good or bad. (compared to advising without it being publicly announced)
On one hand it boosts awareness of the CAIS and gives you the opportunity to cause them some amount of negative publicity if you at some point distance yourself. On the other it does grant them some license to brush off safety worries by gesturing at your involvement.
No good deed goes unpunished. By default there would likely be no advising.
I am not sure what this comment is responding to.
The only criticism of you and your team in the OP is that you named your team the “Center” for AI Safety, as though you had much history leading safety efforts or had a ton of buy-in from the rest of the field. I don’t believe that either of these are true[1], it seems to me that the name preceded you engaging in major safety efforts. This power-grab for being the “Center” of the field was a step toward putting you in a position to be publicly interviewed and on advisory boards like this and coordinate the signing of a key statement[2]. It is a symmetric tool for gaining power, and doesn’t give me any evidence about whether you having this power is a good or bad thing.
There is also a thread of discussion that I would say hinges on the question of “whether you being an advisor is a substantive change to the orgs’ direction or safety-washing”. I don’t really know, and this is why I don’t consider “By default there would likely be no advising” a sufficient argument to show that this is a clearly good deed and not a bad or neutral deed.
For instance, I would be interested to know whether your org name was run by people such as Katja Grace, Paul Christiano, Vika Krakovna, Jan Leike, Scott Garrabrant, Evan Hubinger, or Stuart Armstrong, all people who have historically made a number of substantive contributions to the field. I currently anticipate that you did not get feedback from most of these people on the team name. I would be pleased to have my beliefs falsified on this question.
I think the statement was a straightforward win and good deed and I support it. Other work of yours I have mixed or negative impressions of.
Dan spent his entire PhD working on AI safety and did some of the most influential work on OOD robustness and OOD detection, as well as writing Unsolved Problems. Even if this work is less valued by some readers on LessWrong (imo mistakenly), it seems pretty inaccurate to say that he didn’t work on safety before founding CAIS.
Fwiw, I disagree that “center” carries these connotations. To me it’s more like “place where some activity of a certain kind is carried out”, or even just a synonym of “institute”. (I feel the same about the other 5-10 EA-ish “centers/centres” focused on AI x-risk-reduction.) I guess I view these things more as “a center of X” than “the center of X”. Maybe I’m in the minority on this but I’d be kind of surprised if that were the case.
Interesting. Good point about there being other examples. I’ll list some of them that I can find with a quick search, and share my impressions of whether the names are good/bad/neutral.
UC Berkeley’s Center for Human-Compatible AI
Paul Christiano’s Alignment Research Center
Center for Long-Term Risk
Center for Long-Term Resilience
The first feels pretty okay to me because (a) Stuart Russell has a ton of leadership points to spend in the field of AI, given he wrote the standard textbook (prior to deep learning), and (b) this is within the academic system, where I expect the negotiation norms for naming have been figured out many decades ago.
I thought about the second one at the time. It feels slightly like it’s taking central namespace, but I think Paul Christiano is on a shortlist of people who can reasonably take the mantle as foremost alignment researcher, so I am on net supportive of him having this name (and I expect many others in the field are perfectly fine with it too). I also mostly do not expect Paul to use the name for that much politicking, who in many ways seems to try to do relatively inoffensive things.
I don’t have many associations with the third and fourth, and don’t see them as picking up much political capital from the names. Insofar as they’re putting themselves as representative of the ‘longtermism’ flag, I don’t particularly feel connected to that flag and am not personally interested in policing its use. And otherwise, if I made a non-profit called (for example) “The Center for Mathematical Optimization Methods” I think that possibly one or two people would be annoyed if they didn’t like my work, but mostly I don’t think there’s a substantial professional network or field of people that I’d be implicitly representing, and those who knew about my work would be glad that anyone was making an effort.
I’ll repeat that one of my impressions here is that Dan is picking up a lot of social and political capital that others have earned by picking “Center for AI Safety”, a field he didn’t build, hadn’t led before, and is aggressively spending the political capital of in ways that most of the people who did build the field can’t check and don’t know whether they endorse. (With the exception that I believe the signed statement would be endorsed by most all people in the field and I’m glad it was executed successfully.)
(As a related gripe, Dan and his team also moved to overwrite the namespace that Drexler of the FHI occupied, in a way that seemed uncaring/disrespectful to me.)
I think your analysis makes sense if using a “center” name really should require you to have some amount of eminence or credibility first. I’ve updated a little bit in that direction now, but I still mostly think it’s just synonymous with “institute”, and on that view I don’t care if someone takes a “center” name (any more than if someone takes an “institute” name). It’s just, you know, one of the five or so nouns non-profits and think tanks use in their names (“center”, “institute”, “foundation”, “organization”, “council”, blah).
Or actually, maybe it’s more like I’m less convinced that there’s a common pool of social/political capital that CAIS is now spending from. I think the signed statement has resulted in other AI gov actors now having higher chances of getting things done. I think if the statement had been not very successful, it wouldn’t have harmed those actors’ ability to get things done. (Maybe if it was really botched it would’ve, but then my issue would’ve been with CAIS’s botching the statement, not with their name.)
I guess I also don’t really buy that using “center” spends from this pool (to the extent that there is a pool). What’s the scarce resource it’s using? Policy-makers’ time/attention? Regular people’s time/attention? Or do people only have a fixed amount of respect or credibility to accord various AI safety orgs? I doubt, for example, that other orgs lost out on opportunities to influence people, or inform policy-makers, due to CAIS’s actions. I guess what I’m trying to say is I’m a bit confused about your model!
Btw, in case it matters, the other examples I had in mind were Center for Security and Emerging Technology (CSET) and Centre for the Governance of AI (GovAI).
I’m finding it hard to explain why I think that naming yourself “The Center for AI Safety” is taking ownership of a prime piece of namespace real estate and also positioning yourselves as representing the field moreso than if you call yourself “Conjecture” or “Redwood Research” or “Anthropic”.
Like, consider an outsider trying to figure out who to go to in this field to help out or ask questions. They will be more likely to figure out that “The Center for AI Safety” is a natural place to go to rather than “the Future of Humanity Institute”, even though the second one has done a lot more relevant research.
There are basically no other organizations with “AI Safety” in the name, even though when Dan picked the name the others had all done more work to build up the field (e.g. FLI’s puerto rico conference, FHI’s book on Superintelligence, MIRI’s defining the field a decade ago, more).
[Added: I think names should be accurate on Simulacra Level 1 and not Simulacra Level 3-4. For instance, if lots of small and incompetent organizations call themselves “Center” and “Institute” when they aren’t much of a center of anything and aren’t long-lasting institutions, this is bad for our ability to communicate and it isn’t okay even if a lot of people do it. “Organization” seems like a pretty low bar and means something less substantive. “Council” seems odd here and only counts if you are actually primarily a council to other institutions.]
I think it’s pretty clear that there is. Of course “the AI Safety field” has a shared reputation and social capital. People under the same flag all are taking and investing resources into that flag. Scandals in some parishes of the Catholic Church affect the reputation of other Catholic parishes (and other Christians broadly). The reputation of YC affects the reputation of Silicon Valley and startups more broadly. When I watch great talks by Patrick Collison I more want to work in startups and think highly of them, when I see Sam Bankman-Fried steal billions of dollars I think that in general crypto orgs are more likely to be frauds.
I agree that the signed sentence gave more capital back to others than they have spent. Hendrycks’ team are not in-debt in my book. I nonetheless want to name costs and be open about criticism, and also say I am a bit nervous about what they will do in the relationship with xAI (e.g. it feels a bit like they might give xAI a tacit endorsement from the AI Safety community, which could easily be exceedingly costly).
For one, in an incognito browser, their website is the first google result for the search term “AI Safety”, before even the wikipedia page.
After seeing The Center for AI Policy and The AI Policy Institute launch within a month of one another I am now less concerned about first-movers monopolizing the namespace.
What were the other options? Have you considered advising xAI privately, or re-directing xAI to be advised by someone else? Also, would the default be clearly worse?
As you surely are quite aware of, one of the bigger fights about AI safety across academia, policymaking and public spaces now is the discussion about AI safety being “distraction” from immediate social harms, and being actually the agenda favoured by the leading labs and technologists. (Often comes with accusations of attempted regulatory capture, worries about concentration of power, etc.)
In my view, given this situation, it seems valuable to have AI safety represented also by somewhat neutral coordination institutions without obvious conflicts of interest and large attack surfaces.
As I wrote in the OP, CAIS made some relatively bold moves to became one of the most visible “public representatives” of AI safety—including the name choice, and organizing the widely reported Statement on AI risk (which was a success). Until now, my impression was when you are taking the namespace, you also aim for CAIS to be such “somewhat neutral coordination institution without obvious conflicts of interest and large attack surfaces”.
Maybe I was wrong, and you don’t aim for this coordination/representative role. But if you do, advising xAI seems a strange choice for multiple reasons:
1. it makes you somewhat less neutral party for the broader world; even if the link to xAI does not actually influence your judgement or motivations, I think on priors it’s broadly sensible for policymakers, politicians and public to suspect all kind of activism, advocacy and lobbying efforts having some side-motivations or conflicts of interest, and this strengthens this suspicion
2. the existing public announcements do not inspire confidence in the safety mindset in xAI founders; it seems unclear whether you advised xAI also about the plan “align to curiosity”
3. if xAI turns to be mostly interested in safety-washing, it’s more of problem if it’s aided by more central/representative org
It is unclear to me on whether having your name publicly associated with them is good or bad. (compared to advising without it being publicly announced)
On one hand it boosts awareness of the CAIS and gives you the opportunity to cause them some amount of negative publicity if you at some point distance yourself. On the other it does grant them some license to brush off safety worries by gesturing at your involvement.