ryan_greenblatt comments on “AI Alignment” is a Dangerously Overloaded Term

ryan_greenblatt 15 Dec 2023 16:44 UTC
21 points
2
I think the overloading is actually worse than is discussed in this post, because people also sometimes use the term AI alignment to refer to “ensuring that AIs don’t cause bad outcomes via whatever means”.

For this problematic definition, it is possible to ensure “alignment” by using approaches like AI control despite the AI system desperately wanting to kill you. (At least it’s technically possible, it might not be possible in practice.)

Personally, I think that alignment should be used with the definition as presented in this post by Ajeya (I also linked in another comment).

Can we find ways of developing powerful AI systems such that (to the extent that they’re “trying” to do anything or “want” anything at all), they’re always “trying their best” to do what their designers want them to do, and “really want” to be helpful to their designers?

It’s possible that we need to pick a new word for this because alignment is too overloaded (e.g. AI Aimability as discussed in this post).

ETA: I think a term like “safety” should be used for “ensuring that AIs don’t cause bad outcomes via whatever means”. To more specifically refer to preventing AI takeover (instead of more mundane harm), we can maybe use “takeover prevention” or perhaps “existential safety”.