Edit:
User davidad wrote a comprehensive overview of other actors in the field who have begun using AI alignment instead of AI safety as the standard terminology to refer to the control problem. It seems this is a general or growing preference in the field that isn’t a complete consensus. That may only be because of inertia from several years ago before AI alignment and the control problem were as distinguished as well in the field. Sometimes the term control problem is simply used instead of either of the other terms.
I originally characterized the control/alignment problem as synonymous with any x-risks from AI. User antimonyanthony clarified the control problem is not the only way AI may pose an existential risk. I’ve edited this post accordingly.
During conversations about x-risks from AI among a community broader than the rationality or x-risk communities, such as in effective altruism or social media, I’ve seen Eliezer Yudkowsky and Ben Pace clarify that the preferred term to refer to the control problem is “AI alignment.” I understand this is to distinguish other ethical and security concerns about AI, which is what “AI safety” has come to mean, from specifically existential risks from AI. Yet I’ve only seen those involved in x-risk work coming from the rationality community saying this is the preferred term. That main reason for that might be that maybe the majority of people I know working on anything that could be called either AI alignment or AI safety are also in the rationality community.
Is there any social cluster in the professional/academic/whatever AI communities other than the x-risk reduction cluster around the rationality community who prefers this terminology?
The term “AI alignment” can be traced to the longer phrase “the value alignment problem” found in a Nov 2014 essay by Stuart Russell, whence it was picked up by Rob Bensinger, then adopted by Eliezer in 2015, and used by Paul in 2016. Although Paul still preferred the name “AI control” in 2016 for the medium-scope problem of ensuring that AI systems “don’t competently pursue the wrong thing”, he renamed his blog from AI Control to AI Alignment at some point between 2016 and 2018. “AI Alignment” really took off when it was adopted by Rohin for his first Newsletter in April 2018 and incorporated in the name of the Alignment Forum in July 2018. Wikipedians renamed “motivation control” to “alignment” in April 2020, and Brian Christian’s The Alignment Problem came out in October 2020.
Digging deeper, “value alignment” was also the subject of a 2002(!) AAAI paper by Shapiro and Shachter (which also anticipates Everitt’s use of causal influence diagrams in alignment research); it seems plausible that this was a cause of Russell’s 2014 use of the phrase, or not.
Anyway, the 2002 paper never really caught on (18 citations to date), and Russell has never consistently used the word “alignment”, later calling the problem “robustly beneficial AI”, then “provably beneficial AI”, and finally settling on “the problem of control” (as in the subtitle of his 2019 book) or “the control problem”. So the result is that pretty much every contemporary use of “AI alignment” is memetically downstream of MIRI, at least partially. However, that watershed includes OpenAI (where Jan Leike’s official title is “alignment team lead”, and there are job postings for the Alignment team), DeepMind (which has published papers about “Alignment”), a cluster at UC Berkeley, and scattered researchers in Europe (Netherlands, Finland, Cambridge, Moscow,...).
Strongly upvoted. Thanks for your comprehensive review. This might be the best answer I’ve ever received for any question I’ve asked on LW.
In my opinion, given that these other actors who’ve adopted the term are arguably leaders in the field more than MIRI, it’s valid for someone in the rationality community to claim it’s in fact the preferred term. A more accurate statement would be:
There is a general or growing preference for the term AI alignment be used instead of AI safety to refer to the control problem.
There isn’t a complete consensus on this but there may not be a good reason for that and it’s only because there is inertia in the field from years ago when the control problem wasn’t distinguished as often from other ethics or security concerns about advanced AI.
Clarifying all of that by default isn’t necessary but it would be worth mentioning if anyone asks which organizations or researchers beyond MIRI also agree.
As someone who teaches undergraduates a bit about AI Safety/alignment in my economics of future technology course at Smith College I much prefer “AI safety”
as the term is far clearer to people unfamiliar with the issues.
AI alignment is the term MIRI (among other actors in the field) ostensibly prefers to refer to the control problem instead of AI safety to distinguish it from other AI-related ethics or security issues because those other issues don’t constitute x-risks. Of course the extra jargon could be confusing for a large audience being exposed to AI safety and alignment concerns for the first time. In the case of introducing the field to prospective entrants into the field or students, keeping it simpler as you do may very easily be the better way to go.