Upvoted, this was exactly my reaction to this post. However, you may want to look at the link to alignment in the OP. Christiano is using “alignment” in a very narrow sense. For example, from the linked post:
The definition is intended de dicto ratherthan de re. An aligned A is trying to “do what H wants it to do.” Suppose A thinks that H likes apples, and so goes to the store to buy some apples, but H really prefers oranges. I’d call this behavior aligned because A is trying to do what H wants, even though the thing it is trying to do (“buy apples”) turns out not to be what H wants: the de re interpretation is false but the de dicto interpretation is true.
… which rings at least slightly uncomfortable to my ears.
Well then, isn’t the answer that we care about de re alignment, and whether or not an AI is de dicto aligned is relevant only as far as it predicts de re alignment? We might expect that the two would converge in the limit of superintelligence, and perhaps that aiming for de dicto alignment might be the easier immediate target, but the moral worth would be a factor of what the AI actually did.
That does clear up the seeming confusion behind the OP, though, so thanks!
Upvoted, this was exactly my reaction to this post. However, you may want to look at the link to alignment in the OP. Christiano is using “alignment” in a very narrow sense. For example, from the linked post:
… which rings at least slightly uncomfortable to my ears.
Well then, isn’t the answer that we care about de re alignment, and whether or not an AI is de dicto aligned is relevant only as far as it predicts de re alignment? We might expect that the two would converge in the limit of superintelligence, and perhaps that aiming for de dicto alignment might be the easier immediate target, but the moral worth would be a factor of what the AI actually did.
That does clear up the seeming confusion behind the OP, though, so thanks!