for me another major con is that this decomposition moves some problems that I think are crucial and urgent out of “AI alignment” and into the “competence” part, with the implicit or explicit implication that they are not as important, for example the problem of obtaining or helping humans to obtain a better understanding of their values and defending their values against manipulation from other AIs.
I think it’s bad to use a definitional move to try to implicitly prioritize or deprioritize research. I think I shouldn’t have written: “I like it less because it includes many subproblems that I think (a) are much less urgent, (b) are likely to involve totally different techniques than the urgent part of alignment.”
That said, I do think it’s important that these seem like conceptually different problems and that different people can have different views about their relative importance—I really want to discuss them separately, try to solve them separately, compare their relative values (and separate that from attempts to work on either).
I don’t think it’s obvious that alignment is higher priority than these problems, or than other aspects of safety. I mostly think it’s a useful category to be able to talk about separately. In general I think that it’s good to be able to separate conceptually separate categories, and I care about that particularly much in this case because I care particularly much about this problem. But I also grant that the term has inertia behind it and so choosing its definition is a bit loaded and so someone could object on those grounds even if they bought that it was a useful separation.
(I think that “defending their values against manipulation from other AIs” wasn’t include under any of the definitions of “alignment” proposed by Rob in our email discussion about possible definitions, so it doesn’t seem totally correct to refer to this as “moving” those subproblems, so much as there already existing a mess of imprecise definitions some of which included and some of which excluded those subproblems.)
I think it’s bad to use a definitional move to try to implicitly prioritize or deprioritize research. I think I shouldn’t have written: “I like it less because it includes many subproblems that I think (a) are much less urgent, (b) are likely to involve totally different techniques than the urgent part of alignment.”
That said, I do think it’s important that these seem like conceptually different problems and that different people can have different views about their relative importance—I really want to discuss them separately, try to solve them separately, compare their relative values (and separate that from attempts to work on either).
I don’t think it’s obvious that alignment is higher priority than these problems, or than other aspects of safety. I mostly think it’s a useful category to be able to talk about separately. In general I think that it’s good to be able to separate conceptually separate categories, and I care about that particularly much in this case because I care particularly much about this problem. But I also grant that the term has inertia behind it and so choosing its definition is a bit loaded and so someone could object on those grounds even if they bought that it was a useful separation.
(I think that “defending their values against manipulation from other AIs” wasn’t include under any of the definitions of “alignment” proposed by Rob in our email discussion about possible definitions, so it doesn’t seem totally correct to refer to this as “moving” those subproblems, so much as there already existing a mess of imprecise definitions some of which included and some of which excluded those subproblems.)