Nicholas / Heather Kross comments on Dreams of AI alignment: The danger of suggestive names

Nicholas / Heather Kross 12 Feb 2024 18:28 UTC
3 points
−3
How many times has someone expressed “I’m worried about ‘goal-directed optimizers’, but I’m not sure what exactly they are, so I’m going to work on deconfusion.”? There’s something weird about this sentiment, don’t you think?
I disagree, and I will take you up on this!
“Optimization” is a real, meaningful thing to fear, because:
- We don’t understand human values, or even necessarily meta-understand them.
- Therefore, we should be highly open to the idea that a goal (or meta-goal) that we encode (or meta-encode) would be bad for anything powerful to base-level care about.
- And most importantly, high optimization power breaks insufficiently-strong security assumptions. That, in itself, is why something like “security mindset” is useful without necessarily thinking of a powerful AI as an “enemy” in war-like terms.
  - Here “security assumptions” is used in a broad sense, the same way that “writing assumptions” (the ones needed to design a word-processor software) could include seemingly-trivial things like “there is an input device we can access” and “we have the right permissions on this OS”.