I think that “AI Alignment” is a useful label for the somewhat related problems around P1-P6. Having a term for the broader thing seems really useful.
Of course, sometimes you want labels to refer to a fairly narrow thing, like the label “Continuum Hypothesis”. But broad labels are generally useful. Take “ethics”, another broad field label. Nominative ethics, applied ethics, meta-ethics, descriptive ethics, value theory, moral psychology, et cetera. I someone tells me “I study ethics” this narrows down what problems they are likely to work on, but not very much. Perhaps they work out a QALY-based systems for assigning organ donations, or study the moral beliefs of some peoples, or argue if moral imperatives should have a truth value. Still, the label confers a lot of useful information over a broader label like “philosophy”.
By contrast, “AI Alignment” still seems rather narrow. P2 for example seems a mostly instrumental goal: if we have interpretability, we have better chances to avoid a takeover of an unaligned AI. P3 seems helpful but insufficient for good long term outcomes: an AI prone to disobeying users or interpreting their orders in a hostile way would—absent some other mechanism—also fail to follow human values more broadly, but an P3-aligned AI in the hand of a bad human actor could still cause extinction, and I agree that social structures should probably be established to ensure that nobody can unilaterally assign the core task (or utility function) of an ASI.
I would agree that it would be good and reasonable to have a term to refer to the family of scientific and philosophical problem spanned by this space. At the same time, as the post says, the issue is when there is semantic dilution, people talking past each other, and coordination-inhibiting ambiguity.
P3 seems helpful but insufficient for good long term outcomes
Now take a look at something I could check with a simple search: an ICML Workshop that uses the term alignment mostly to mean P3 (task-reliability) https://arlet-workshop.github.io/
One might want to use alignment one way or the other, and be careful of the limited overlap with P3 in our own registers, but by the time the larger AI community has picked up on the use-semantics of ‘RLHF is an alignment technique’ and associated alignment primarily with task-reliability, you’d need some linguistic interventions and deliberation to clear the air.
I think that “AI Alignment” is a useful label for the somewhat related problems around P1-P6. Having a term for the broader thing seems really useful.
Of course, sometimes you want labels to refer to a fairly narrow thing, like the label “Continuum Hypothesis”. But broad labels are generally useful. Take “ethics”, another broad field label. Nominative ethics, applied ethics, meta-ethics, descriptive ethics, value theory, moral psychology, et cetera. I someone tells me “I study ethics” this narrows down what problems they are likely to work on, but not very much. Perhaps they work out a QALY-based systems for assigning organ donations, or study the moral beliefs of some peoples, or argue if moral imperatives should have a truth value. Still, the label confers a lot of useful information over a broader label like “philosophy”.
By contrast, “AI Alignment” still seems rather narrow. P2 for example seems a mostly instrumental goal: if we have interpretability, we have better chances to avoid a takeover of an unaligned AI. P3 seems helpful but insufficient for good long term outcomes: an AI prone to disobeying users or interpreting their orders in a hostile way would—absent some other mechanism—also fail to follow human values more broadly, but an P3-aligned AI in the hand of a bad human actor could still cause extinction, and I agree that social structures should probably be established to ensure that nobody can unilaterally assign the core task (or utility function) of an ASI.
I would agree that it would be good and reasonable to have a term to refer to the family of scientific and philosophical problem spanned by this space. At the same time, as the post says, the issue is when there is semantic dilution, people talking past each other, and coordination-inhibiting ambiguity.
Now take a look at something I could check with a simple search: an ICML Workshop that uses the term alignment mostly to mean P3 (task-reliability) https://arlet-workshop.github.io/
One might want to use alignment one way or the other, and be careful of the limited overlap with P3 in our own registers, but by the time the larger AI community has picked up on the use-semantics of ‘RLHF is an alignment technique’ and associated alignment primarily with task-reliability, you’d need some linguistic interventions and deliberation to clear the air.