I’m not sure what order the history happened in and whether “AI Existential Safety” got rebranded into “AI Alignment” (my impression is that AI Alignment was first used to mean existential safety, and maybe this was a bad term, but it wasn’t a rebrand)
There was a pretty extensive discussion about this between Paul Christiano and me. tl;dr “AI Alignment” clearly had a broader (but not very precise) meaning than “How to get AI systems to try to do what we want” when it first came into use. Paul later used “AI Alignment” for his narrower meaning, but after that discussion, switched to using “Intent Alignment” for this instead.
tl;dr “AI Alignment” clearly had a broader (but not very precise) meaning than “How to get AI systems to try to do what we want” when it first came into use. Paul later used “AI Alignment” for his narrower meaning, but after that discussion, switched to using “Intent Alignment” for this instead.
I don’t think I really agree with this summary. Your main justification was that Eliezer used the term with an extremely broad definition on Arbital, but the Arbital page was written way after a bunch of other usage (including after me moving to ai-alignment.com I think). I think very few people at the time would have argued that e.g. “getting your AI to be better at politics so it doesn’t accidentally start a war” is value alignment though it obviously fits under Eliezer’s definition.
(ETA: actually the Arbital page is old, it just wasn’t indexed by the wayback machine and doesn’t come with a date on Arbital itself. so So I agree with the point that this post is evidence for an earlier very broad usage.)
I would agree with “some people used it more broadly” but not “clearly had a broader meaning.” Unless “broader meaning” is just “used very vaguely such that there was no agreement about what it means.”
(I don’t think this really matters except for the periodic post complaining about linguistic drift.)
Your main justification was that Eliezer used the term with an extremely broad definition on Arbital, but the Arbital page was written way after a bunch of other usage (including after me moving to ai-alignment.com I think).
But that talk appears to use the narrower meaning though, not the crazy broad one from the later Arbital page. Looking at the transcript:
The first usage is “At the point where we say, “OK, this robot’s utility function is misaligned with our utility function. How do we fix that in a way that it doesn’t just break again later?” we are doing AI alignment theory.” Which seems like it’s really about the goal the agent is pursuing.
The subproblems are all about agents having the right goals. And it continuously talks about pointing agents in the right direction when talking informally about what alignment is.
It doesn’t talk about how there are other parts of alignment that Eliezer just doesn’t care about. It really feels like “alignment” is supposed to be understood to mean getting your AI to be not trying to kill you / trying to help you / something about its goals.
The talk doesn’t have any definitions to disabuse you of this apparent implication.
What part of this talk makes it seem clear that alignment is about the broader thing rather than about making an AI that’s not actively trying to kill you?
There was a pretty extensive discussion about this between Paul Christiano and me. tl;dr “AI Alignment” clearly had a broader (but not very precise) meaning than “How to get AI systems to try to do what we want” when it first came into use. Paul later used “AI Alignment” for his narrower meaning, but after that discussion, switched to using “Intent Alignment” for this instead.
I don’t think I really agree with this summary. Your main justification was that Eliezer used the term with an extremely broad definition on Arbital, but the Arbital page was written way after a bunch of other usage (including after me moving to ai-alignment.com I think). I think very few people at the time would have argued that e.g. “getting your AI to be better at politics so it doesn’t accidentally start a war” is value alignment though it obviously fits under Eliezer’s definition.
(ETA: actually the Arbital page is old, it just wasn’t indexed by the wayback machine and doesn’t come with a date on Arbital itself. so So I agree with the point that this post is evidence for an earlier very broad usage.)
I would agree with “some people used it more broadly” but not “clearly had a broader meaning.” Unless “broader meaning” is just “used very vaguely such that there was no agreement about what it means.”
(I don’t think this really matters except for the periodic post complaining about linguistic drift.)
Eliezer used “AI alignment” as early as 2016 and ai-alignment.com wasn’t registered until 2017. Any other usage of the term that potentially predates Eliezer?
But that talk appears to use the narrower meaning though, not the crazy broad one from the later Arbital page. Looking at the transcript:
The first usage is “At the point where we say, “OK, this robot’s utility function is misaligned with our utility function. How do we fix that in a way that it doesn’t just break again later?” we are doing AI alignment theory.” Which seems like it’s really about the goal the agent is pursuing.
The subproblems are all about agents having the right goals. And it continuously talks about pointing agents in the right direction when talking informally about what alignment is.
It doesn’t talk about how there are other parts of alignment that Eliezer just doesn’t care about. It really feels like “alignment” is supposed to be understood to mean getting your AI to be not trying to kill you / trying to help you / something about its goals.
The talk doesn’t have any definitions to disabuse you of this apparent implication.
What part of this talk makes it seem clear that alignment is about the broader thing rather than about making an AI that’s not actively trying to kill you?
FWIW, I didn’t mean to kick off a historical debate, which seems like probably not a very valuable use of y’all’s time.