Anyway, the 2002 paper never really caught on (18 citations to date), and Russell has never consistently used the word “alignment”, later calling the problem “robustly beneficial AI”, then “provably beneficial AI”, and finally settling on “the problem of control” (as in the subtitle of his 2019 book) or “the control problem”. So the result is that pretty much every contemporary use of “AI alignment” is memetically downstream of MIRI, at least partially. However, that watershed includes OpenAI (where Jan Leike’s official title is “alignment team lead”, and there are job postings for the Alignment team), DeepMind (which has published papers about “Alignment”), a cluster at UC Berkeley, and scattered researchers in Europe (Netherlands, Finland, Cambridge, Moscow,...).
Strongly upvoted. Thanks for your comprehensive review. This might be the best answer I’ve ever received for any question I’ve asked on LW.
In my opinion, given that these other actors who’ve adopted the term are arguably leaders in the field more than MIRI, it’s valid for someone in the rationality community to claim it’s in fact the preferred term. A more accurate statement would be:
There is a general or growing preference for the term AI alignment be used instead of AI safety to refer to the control problem.
There isn’t a complete consensus on this but there may not be a good reason for that and it’s only because there is inertia in the field from years ago when the control problem wasn’t distinguished as often from other ethics or security concerns about advanced AI.
Clarifying all of that by default isn’t necessary but it would be worth mentioning if anyone asks which organizations or researchers beyond MIRI also agree.
The term “AI alignment” can be traced to the longer phrase “the value alignment problem” found in a Nov 2014 essay by Stuart Russell, whence it was picked up by Rob Bensinger, then adopted by Eliezer in 2015, and used by Paul in 2016. Although Paul still preferred the name “AI control” in 2016 for the medium-scope problem of ensuring that AI systems “don’t competently pursue the wrong thing”, he renamed his blog from AI Control to AI Alignment at some point between 2016 and 2018. “AI Alignment” really took off when it was adopted by Rohin for his first Newsletter in April 2018 and incorporated in the name of the Alignment Forum in July 2018. Wikipedians renamed “motivation control” to “alignment” in April 2020, and Brian Christian’s The Alignment Problem came out in October 2020.
Digging deeper, “value alignment” was also the subject of a 2002(!) AAAI paper by Shapiro and Shachter (which also anticipates Everitt’s use of causal influence diagrams in alignment research); it seems plausible that this was a cause of Russell’s 2014 use of the phrase, or not.
Anyway, the 2002 paper never really caught on (18 citations to date), and Russell has never consistently used the word “alignment”, later calling the problem “robustly beneficial AI”, then “provably beneficial AI”, and finally settling on “the problem of control” (as in the subtitle of his 2019 book) or “the control problem”. So the result is that pretty much every contemporary use of “AI alignment” is memetically downstream of MIRI, at least partially. However, that watershed includes OpenAI (where Jan Leike’s official title is “alignment team lead”, and there are job postings for the Alignment team), DeepMind (which has published papers about “Alignment”), a cluster at UC Berkeley, and scattered researchers in Europe (Netherlands, Finland, Cambridge, Moscow,...).
Strongly upvoted. Thanks for your comprehensive review. This might be the best answer I’ve ever received for any question I’ve asked on LW.
In my opinion, given that these other actors who’ve adopted the term are arguably leaders in the field more than MIRI, it’s valid for someone in the rationality community to claim it’s in fact the preferred term. A more accurate statement would be:
There is a general or growing preference for the term AI alignment be used instead of AI safety to refer to the control problem.
There isn’t a complete consensus on this but there may not be a good reason for that and it’s only because there is inertia in the field from years ago when the control problem wasn’t distinguished as often from other ethics or security concerns about advanced AI.
Clarifying all of that by default isn’t necessary but it would be worth mentioning if anyone asks which organizations or researchers beyond MIRI also agree.