Agree. Thinking about mathematical models for agency seems fine because it is fundamental and theorems can get you real understanding, but the more complicated and less elegant your models get and the more tangential they are to the core question of how AI and instrumental convergence work, the less likely they are to be useful.
Evan Hubinger pushed back against this view by defending MIRI’s research approach. [...] we had no highly capable general-purpose models to do experiments on
Some empirical work could have happened well before the shift to empiricism around 2021. FAR AI’s Go attack work could have happened in shortly after LeelaZero was released in 2017, as could interpretability on non-general-purpose models.
Too insular
Many in AI safety have been too quick to dismiss the concerns of AI ethicists [… b]ut AI ethics has many overlaps with AI safety both technically and policy:
Undecided; I used to believe this but then heard that AI ethicists have been uncooperative when alignment people try to reach out. But maybe we are just bad at politics and coalition-building.
AI safety needs more contact with academia. [...] research typically receives less peer review, leading to on average lower quality posts on sites like LessWrong. Much of AI safety research lacks the feedback loops that typical science has.
Agree; I also think that the research methodology and aesthetic of academic machine learning has been underappreciated (although it is clearly not perfect). Historically some good ideas like the LDT paper were rejected in journals, but it is definitely true that many things you do for the sake of publishing actually make your science better, e.g. having both theory and empirical results, or putting your contributions in an ontology people understand. I did not really understand how research worked until attending ICML last year.
Many of the computer science and math kids in AI safety do not value insights from other disciplines enough [....] Norms and values are the equilibria of interactions between individuals, produced by their behaviors, not some static list of rules up in the sky somewhere.
Plausible but with reservations:
I think “interdisciplinary” can be a buzzword that invites a lot of bad research
Thinking of human values as a utility function can be a useful simplifying assumption in developing basic theory
[...] too much jargony and sci-fi language. Esoteric phrases like “p(doom)”, “x-risk” or “HPMOR” can be off-putting to outsiders and a barrier to newcomers, and give culty vibes.
Disagree. This is the useful kind of jargon; “x-risk” is a concept we really want in our vocabulary and it is not clear how to make it sound less weird; if AI safety people are offputting to outsiders it is because we need to be more charismatic and better at communication.
Ajeya Cotra thought some AI safety researchers, like those at MIRI, have been too secretive about the results of their research.
Agree; I think there had been a mindset where since MIRI’s plan for saving the world needed them to reach the frontier of AI research with far safer (e.g. non-ML) designs, they think their AI capabilities ideas are better than they are.
Holly Elmore suspected that this insular behavior was not by mistake, but on purpose. The rationalists wanted to only work with those who see things the same way as them, and avoid too many “dumb” people getting involved.
Undecided; this has not been my experience. I do think people should recognize that AI safety has been heavily influenced by what is essentially a trauma response from being ignored by the scientific establishment from 2003-2023,
Bad messaging
6 respondents thought AI safety could communicate better with the wider world.
Agree. It’s wild to me that e/acc and AI safety seem memetically evenly matched on Twitter (could be wrong about this, if so someone please correct me) while e/acc has a worse favorability rating than Scientology in surveys.
4 thought that some voices push views that are too extreme or weird
I think Eliezer’s confidence is not the worst thing because in most fields there are scientists who are super overconfident. But probably he should be better at communication e.g. realizing that people will react negatively to raising the possibility of nuking bombing datacenters without lots of contextualizing. Undecided on Pause AI and Conjecture.
Ben Cottier lamented the low quality of discourse around AI safety, especially in places like Twitter.
I’m pretty sure a large part of this is some self-perpetuating thing where participating in higher-quality discourse on LW or better, your workplace Slack is more fun than Twitter. Not sure what to do here. Agree about polarization but it’s not clear what to do there either.
AI safety’s relationship with the leading AGI companies
3 respondents also complained that the AI safety community is too cozy with the big AGI companies. A lot of AI safety researchers work at OpenAI, Anthropic and DeepMind. The judgments of these researchers may be biased by a conflict of interest: they may be incentivised for their company to succeed in getting to AGI first. They will also be contractually limited in what they can say about their (former) employer, in some cases even for life.
Agree about conflicts of interest. I remember hearing at one of the AI safety international dialogues, every academic signed but no one with a purely corporate affiliation. There should be some way for safety researchers to divest their equity rather than give it up / donate it and lose 85% of their net worth, but conflicts of interest will remain.
The bandwagon
Many in the AI safety movement do not think enough for themselves, 4 respondents thought.
Slightly agree I guess? I don’t really have thoughts. It makes sense that Alex thinks this because he often disagrees with other safety researchers—not to discredit his position.
Discounting public outreach & governance as a route to safety
Historically, the AI safety movement has underestimated the potential of getting the public on-side and getting policy passed, 3 people said. There is a lot of work in AI governance these days, but for a long time most in AI safety considered it a dead end. The only hope to reduce existential risk from AI was to solve the technical problems ourselves, and hope that those who develop the first AGI implement them. Jamie put this down to a general mistrust of governments in rationalist circles, not enough faith in our ability to solve coordination problems, and a general dislike of “consensus views”.
I think this is largely due to a mistake by Yudkowsky, which is maybe compatible with Jamie’s opinions.
I also want to raise the possibility that the technical focus was rational and correct at the time. Early MIRI/CFAR rationalists were nerds with maybe −1.5 standard deviations of political aptitude on average. So I think it is likely that they would have failed at their policy goals, and maybe even had three more counterproductive events like the Puerto Rico conference where OpenAI was founded. Later, AI safety started attracting political types, and maybe this was the right time to start doing policy.
[Holly] also condemned the way many in AI safety hoped to solve the alignment problem via “elite shady back-room deals”, like influencing the values of the first AGI system by getting into powerful positions in the relevant AI companies.
It doesn’t sound anywhere near as shady if you phrase it as “build a safety focused culture or influence decisions at companies that will build the first AGI”, which seems more accurate.
I thought it was mostly due to the high prevalence of autism (and the social anxiety that usually comes with it) in the community. The more socially agentic rationalists are trying.
But probably he should be better at communication e.g. realizing that people will react negatively to raising the possibility of bombing datacenters without lots of contextualizing.
I’m confident he knew people would react negatively but decided to keep the line because he thought it was worth the cost.
But probably he should be better at communication e.g. realizing that people will react negatively to raising the possibility of nuking datacenters without lots of contextualizing.
Yeah, pretty sure Eliezer never recommended nuking datacenters. I don’t know who you heard it from, but this distortion is slanderous and needs to stop. I can’t control what everybody says elsewhere, but it shouldn’t be acceptable on LessWrong, of all places.
He did talk about enforcing a global treaty backed by the threat of force (because all law is ultimately backed by violence, don’t pretend otherwise). He did mention that destroying “rogue” datacenters (conventionally, by “airstrike”) to enforce said treaty had to be on the table, even if the target datacenter is located in a nuclear power who might retaliate (possibly risking a nuclear exchange), because risking unfriendly AI is worse.
He did talk about enforcing a global treaty backed by the threat of force (because all law is ultimately backed by violence, don’t pretend otherwise)
Most international treaties are not backed by military force, such as the threat of airstrikes. They’re typically backed by more informal pressures, such as diplomatic isolation, conditional aid, sanctions, asset freezing, damage to credibility and reputation, and threats of mutual defection (i.e., “if you don’t follow the treaty, then I won’t either”). It seems bad to me that Eliezer’s article incidentally amplified the idea that most international treaties are backed by straightforward threats of war, because that idea is not true.
My opinions:
Too many galaxy-brained arguments & not enough empiricism
Agree. Although there is sometimes a tradeoff between direct empirical testability and relevance to long-term alignment.
Agree. Thinking about mathematical models for agency seems fine because it is fundamental and theorems can get you real understanding, but the more complicated and less elegant your models get and the more tangential they are to the core question of how AI and instrumental convergence work, the less likely they are to be useful.
Some empirical work could have happened well before the shift to empiricism around 2021. FAR AI’s Go attack work could have happened in shortly after LeelaZero was released in 2017, as could interpretability on non-general-purpose models.
Too insular
Undecided; I used to believe this but then heard that AI ethicists have been uncooperative when alignment people try to reach out. But maybe we are just bad at politics and coalition-building.
Agree; I also think that the research methodology and aesthetic of academic machine learning has been underappreciated (although it is clearly not perfect). Historically some good ideas like the LDT paper were rejected in journals, but it is definitely true that many things you do for the sake of publishing actually make your science better, e.g. having both theory and empirical results, or putting your contributions in an ontology people understand. I did not really understand how research worked until attending ICML last year.
Plausible but with reservations:
I think “interdisciplinary” can be a buzzword that invites a lot of bad research
Thinking of human values as a utility function can be a useful simplifying assumption in developing basic theory
Disagree. This is the useful kind of jargon; “x-risk” is a concept we really want in our vocabulary and it is not clear how to make it sound less weird; if AI safety people are offputting to outsiders it is because we need to be more charismatic and better at communication.
Agree; I think there had been a mindset where since MIRI’s plan for saving the world needed them to reach the frontier of AI research with far safer (e.g. non-ML) designs, they think their AI capabilities ideas are better than they are.
Undecided; this has not been my experience. I do think people should recognize that AI safety has been heavily influenced by what is essentially a trauma response from being ignored by the scientific establishment from 2003-2023,
Bad messaging
Agree. It’s wild to me that e/acc and AI safety seem memetically evenly matched on Twitter (could be wrong about this, if so someone please correct me) while e/acc has a worse favorability rating than Scientology in surveys.
I think Eliezer’s confidence is not the worst thing because in most fields there are scientists who are super overconfident. But probably he should be better at communication e.g. realizing that people will react negatively to raising the possibility of
nukingbombing datacenters without lots of contextualizing. Undecided on Pause AI and Conjecture.I’m pretty sure a large part of this is some self-perpetuating thing where participating in higher-quality discourse on LW or better, your workplace Slack is more fun than Twitter. Not sure what to do here. Agree about polarization but it’s not clear what to do there either.
AI safety’s relationship with the leading AGI companies
Agree about conflicts of interest. I remember hearing at one of the AI safety international dialogues, every academic signed but no one with a purely corporate affiliation. There should be some way for safety researchers to divest their equity rather than give it up / donate it and lose 85% of their net worth, but conflicts of interest will remain.
The bandwagon
Slightly agree I guess? I don’t really have thoughts. It makes sense that Alex thinks this because he often disagrees with other safety researchers—not to discredit his position.
Discounting public outreach & governance as a route to safety
I think this is largely due to a mistake by Yudkowsky, which is maybe compatible with Jamie’s opinions.
I also want to raise the possibility that the technical focus was rational and correct at the time. Early MIRI/CFAR rationalists were nerds with maybe −1.5 standard deviations of political aptitude on average. So I think it is likely that they would have failed at their policy goals, and maybe even had three more counterproductive events like the Puerto Rico conference where OpenAI was founded. Later, AI safety started attracting political types, and maybe this was the right time to start doing policy.
It doesn’t sound anywhere near as shady if you phrase it as “build a safety focused culture or influence decisions at companies that will build the first AGI”, which seems more accurate.
Mostly due to a feeling of looking down on people imo
I thought it was mostly due to the high prevalence of autism (and the social anxiety that usually comes with it) in the community. The more socially agentic rationalists are trying.
I’m confident he knew people would react negatively but decided to keep the line because he thought it was worth the cost.
Seems like a mistake by his own lights IMO.
Yeah, pretty sure Eliezer never recommended nuking datacenters. I don’t know who you heard it from, but this distortion is slanderous and needs to stop. I can’t control what everybody says elsewhere, but it shouldn’t be acceptable on LessWrong, of all places.
He did talk about enforcing a global treaty backed by the threat of force (because all law is ultimately backed by violence, don’t pretend otherwise). He did mention that destroying “rogue” datacenters (conventionally, by “airstrike”) to enforce said treaty had to be on the table, even if the target datacenter is located in a nuclear power who might retaliate (possibly risking a nuclear exchange), because risking unfriendly AI is worse.
Most international treaties are not backed by military force, such as the threat of airstrikes. They’re typically backed by more informal pressures, such as diplomatic isolation, conditional aid, sanctions, asset freezing, damage to credibility and reputation, and threats of mutual defection (i.e., “if you don’t follow the treaty, then I won’t either”). It seems bad to me that Eliezer’s article incidentally amplified the idea that most international treaties are backed by straightforward threats of war, because that idea is not true.
Thanks, fixed.