But that talk appears to use the narrower meaning though, not the crazy broad one from the later Arbital page. Looking at the transcript:
The first usage is “At the point where we say, “OK, this robot’s utility function is misaligned with our utility function. How do we fix that in a way that it doesn’t just break again later?” we are doing AI alignment theory.” Which seems like it’s really about the goal the agent is pursuing.
The subproblems are all about agents having the right goals. And it continuously talks about pointing agents in the right direction when talking informally about what alignment is.
It doesn’t talk about how there are other parts of alignment that Eliezer just doesn’t care about. It really feels like “alignment” is supposed to be understood to mean getting your AI to be not trying to kill you / trying to help you / something about its goals.
The talk doesn’t have any definitions to disabuse you of this apparent implication.
What part of this talk makes it seem clear that alignment is about the broader thing rather than about making an AI that’s not actively trying to kill you?
But that talk appears to use the narrower meaning though, not the crazy broad one from the later Arbital page. Looking at the transcript:
The first usage is “At the point where we say, “OK, this robot’s utility function is misaligned with our utility function. How do we fix that in a way that it doesn’t just break again later?” we are doing AI alignment theory.” Which seems like it’s really about the goal the agent is pursuing.
The subproblems are all about agents having the right goals. And it continuously talks about pointing agents in the right direction when talking informally about what alignment is.
It doesn’t talk about how there are other parts of alignment that Eliezer just doesn’t care about. It really feels like “alignment” is supposed to be understood to mean getting your AI to be not trying to kill you / trying to help you / something about its goals.
The talk doesn’t have any definitions to disabuse you of this apparent implication.
What part of this talk makes it seem clear that alignment is about the broader thing rather than about making an AI that’s not actively trying to kill you?
FWIW, I didn’t mean to kick off a historical debate, which seems like probably not a very valuable use of y’all’s time.