Instrumental convergence makes differences in values hard to notice, so there can be abudant examples of misalignment that remain unobtrusive. The differences only become a glaring problem with enough inequality of power when coercing or outright overwriting others becomes feasible (Fnargl only reaches the coercing stage, but not overwriting stage). Thus even differences in values between humans and randomly orthogonal AGIs can seem non-threatening until they aren’t, the same as differences in human values can remain irrelevant for average urban dwellers.
Alignment on values becomes crucial given overwriting-level power differences, since a thought experiment with putting a human in that position predicts a decent chance of non-doom, even though it’s not a good plan on its own. Conversely, unfettered superintelligent paperclip maximizers or Fnargls don’t leave survivors. And since the world is very far from an equilibrium of abundant superintelligence, there are going to be unprecedented power differentials while it’s settling. This makes the common sense impression of irrelevance of misalignment on values (being masked by instrumental convergence) misleading when it comes to AGI.
Instrumental convergence makes differences in values hard to notice, so there can be abudant examples of misalignment that remain unobtrusive. The differences only become a glaring problem with enough inequality of power when coercing or outright overwriting others becomes feasible (Fnargl only reaches the coercing stage, but not overwriting stage). Thus even differences in values between humans and randomly orthogonal AGIs can seem non-threatening until they aren’t, the same as differences in human values can remain irrelevant for average urban dwellers.
Alignment on values becomes crucial given overwriting-level power differences, since a thought experiment with putting a human in that position predicts a decent chance of non-doom, even though it’s not a good plan on its own. Conversely, unfettered superintelligent paperclip maximizers or Fnargls don’t leave survivors. And since the world is very far from an equilibrium of abundant superintelligence, there are going to be unprecedented power differentials while it’s settling. This makes the common sense impression of irrelevance of misalignment on values (being masked by instrumental convergence) misleading when it comes to AGI.