TAG comments on Frequent arguments about alignment

TAG 6 Dec 2021 20:22 UTC
2 points

It seems to me that the distinction between “alignment” and “misalignment” has become something of a motte and bailey. Historical arguments that AIs would be misaligned used it in sense 1: “AIs having sufficiently general and large-scale motivations that they acquire the instrumental goal of killing all humans (or equivalently bad behaviour)”. Now people are using the word in sense 2: “AIs not quite doing what we want them to do”.

There’s an identical problem with “friendliness”. Sometimes unfriendliness means we all die, sometimes it means we don’t get utopia.