RogerDearnaley comments on “AI Alignment” is a Dangerously Overloaded Term

RogerDearnaley 18 Dec 2023 11:24 UTC
1 point
0
Outer alignment is (if you read a couple more sentences of the definition) not about “how to decide what we want”, but “how do we ensure that the reward/utility function we write down matches what we want”. So “Do What We Mean” is a magical-solution to the Outer Alignment problem, but if your AI then tells you “You-all don’t know what you mean” or “Which definition of ‘we’ did you mean?”, then you have a goalcraft problem.