It seems to me that the distinction between “alignment” and “misalignment” has become something of a motte and bailey. Historical arguments that AIs would be misaligned used it in sense 1: “AIs having sufficiently general and large-scale motivations that they acquire the instrumental goal of killing all humans (or equivalently bad behaviour)”. Now people are using the word in sense 2: “AIs not quite doing what we want them to do”.
There’s an identical problem with “friendliness”. Sometimes unfriendliness means we all die, sometimes it means we don’t get utopia.
There’s an identical problem with “friendliness”. Sometimes unfriendliness means we all die, sometimes it means we don’t get utopia.