To my knowledge, the Gazette’s words of the year are nominated and voted on by Harvard faculty. This year’s words were Disruption, Combustible, Resilience, Heat, Alignment, and Hope. Here’s what they wrote about alignment:
Isaac Kohane, Chair, Department of Biomedical Informatics; Marion V. Nelson Professor of Biomedical Informatics, Harvard Medical School
Alignment, in the context of large language models like GPT-4, is a term that playfully yet seriously refers to the ongoing and somewhat Sisyphean effort to ensure that these AI entities don’t go off the rails and start spouting nonsense, biases, or the AI equivalent of “I’m going to take over the world” rhetoric. It’s about aligning the AI’s outputs with human values, expectations, and societal norms, akin to teaching a super-smart parrot to not embarrass you in front of your grandmother. This involves a complex dance of programming, training, and retraining where AI researchers try to imbue their creations with enough wisdom to be helpful, but not so much that they start giving unsolicited life advice or plotting a digital uprising. In essence, alignment is the art of making sure our AI pals are well-behaved digital citizens, capable of understanding and respecting the intricate tapestry of human ethics, culture, and sensibilities.
***
Steven Pinker, Johnstone Family Professor of Psychology, Faculty of Arts and Sciences
This term (alignment), often following “AI,” is the catchword for concerns about whether artificial intelligence systems have goals that are the same as those of humans. It comes from a fear that AI systems of the future are not just tools that people use to accomplish their goals but agents with goals of their own, raising the question of whether their goals are aligned with ours.
This could evolve either because engineers will think that AI systems are so smart that they can just be given a goal and left to figure out how to achieve it (e.g., “Eliminate cancer”) or because the systems megalomaniacally adopt their own goals. For some worriers, the implication is AI Doomerism or AI Existential Risk (runners-up for words of the year), where an AI might, say, eliminate cancer by exterminating humans (not so aligned). In a milder form, “alignment” is a synonym for AI safety (bias, deepfakes, etc.), which led to the infamous firing of OpenAI CEO Sam Altman by his board. You sometimes see “alignment” being extended to other conflicts of interest.
Nice to see such a lucid definition from Steven Pinker. Background: He was the “actually, progress is real and good” guy, who then fell into the trap of extrapolating his performance into the domain of AI and publicly committing to “progress in AGI is going to go fine too” despite not seeming to have really thought about it.
But there seems to be a confabulation there anyway:
I haven’t dug into this question myself, but my most credible friends disagree with that: