Not a direct answer to your question, but I want to flag that using “AI alignment” to mean “AI [x-risk] safety” seems like a mistake. Alignment means getting the AI to do what its principal/designer wants, which is not identical to averting AI x-risks (much less s-risks). There are plausible arguments that this is sufficient to avert such risks, but it’s an open question, so I think equating the two is confusing.
I agree, one can conceive of AGI safety without alignment (e.g. if boxing worked), and one can conceive of alignment without safety (e.g. if the AI is “trying to do the right thing” but is careless or incompetent or whatever). I usually use the term “AGI Safety” when describing my job, but the major part of it is thinking about the alignment problem.
I presumed that “AI alignment” was being used as a shorthand for x-risks from AI but I didn’t think of that. I’m not aware either that anyone from the rationality community I’ve seen express this kind of statement really meant for AI alignment to mean all x-risks from AI. That’s my mistake. I’ll presume they’re referring to only the control problem and edit my post to clarify that.
As I understand it, s-risks are a sub-class of x-risks, as an existential risk is not only an extinction risk but any risk of the future trajectory of Earth-originating intelligence being permanently and irreversibly altered for the worse.
Not a direct answer to your question, but I want to flag that using “AI alignment” to mean “AI [x-risk] safety” seems like a mistake. Alignment means getting the AI to do what its principal/designer wants, which is not identical to averting AI x-risks (much less s-risks). There are plausible arguments that this is sufficient to avert such risks, but it’s an open question, so I think equating the two is confusing.
I agree, one can conceive of AGI safety without alignment (e.g. if boxing worked), and one can conceive of alignment without safety (e.g. if the AI is “trying to do the right thing” but is careless or incompetent or whatever). I usually use the term “AGI Safety” when describing my job, but the major part of it is thinking about the alignment problem.
Thanks for flagging this.
I presumed that “AI alignment” was being used as a shorthand for x-risks from AI but I didn’t think of that. I’m not aware either that anyone from the rationality community I’ve seen express this kind of statement really meant for AI alignment to mean all x-risks from AI. That’s my mistake. I’ll presume they’re referring to only the control problem and edit my post to clarify that.
As I understand it, s-risks are a sub-class of x-risks, as an existential risk is not only an extinction risk but any risk of the future trajectory of Earth-originating intelligence being permanently and irreversibly altered for the worse.