“When we refer to “aligned AI”, we are using Paul Christiano’s conception of “intent alignment”, which essentially means the AI system is trying to do what its human operators want it to do.”
Reading this makes me think that the risk of catastrophe due to human use of AGI is higher than I was thinking.
In a word where AGI is not agentic, but is ubiquitous I can easily see people telling “their” AGIs to “destroy X” or “do something about Y” and having catastrophic results. (And attempts to prevent such outcomes could also have catastrophic results for similar reasons.)
So you may need to substantively align AGI (i.e. have AGI with substantive values or hard to alter restrictions) even if the AGI itself does not have agency or goals.
“When we refer to “aligned AI”, we are using Paul Christiano’s conception of “intent alignment”, which essentially means the AI system is trying to do what its human operators want it to do.”
Reading this makes me think that the risk of catastrophe due to human use of AGI is higher than I was thinking.
In a word where AGI is not agentic, but is ubiquitous I can easily see people telling “their” AGIs to “destroy X” or “do something about Y” and having catastrophic results. (And attempts to prevent such outcomes could also have catastrophic results for similar reasons.)
So you may need to substantively align AGI (i.e. have AGI with substantive values or hard to alter restrictions) even if the AGI itself does not have agency or goals.
Interbeing alignment, not ai alignment—when everyone’s super, no one will be