I believe you’re correct that this distinction is useful. I believe the terms inner and outer alignment are already typically used in exactly the way you describe aimability and goalcraft.
These may have changed from the original intended meanings, and there are fuzzy boundaries between inner and outer alignment failures. But I believe they do the work you’re calling for, and are already commonly used.
Inner alignment asks the question: How can we robustly aim our AI optimizers at any objective function at all?
It goes on to discuss several mesa-optimization, one failure focused on in the original introduction of the term inner alignment, and several other modes of possible inner alignment failure.
I believe the terms inner and outer alignment are already typically used in exactly the way you describe Aimability and Goalcraft.
outer alignment as a problem is intuitive enough to understand, i.e., is the specified loss function aligned with the intended goal of its designers?
Outer alignment deals with the problem of matching a formally specified goal function in a computer with an intent in the designer’s mind, but this is not really Goalcrafting which asks what the goal should be.
E.g. Specification gaming is part of outer alignment, but not part of Goalcrafting.
I would classify inner and outer alignment as subcategories of Aimability.
I see that you’re correct. Thanks for the clarification. I’m embarrassed that I’ve been using it wrong.
Now I have no idea where the line between outer and inner alignment falls. It looks like a common point of disagreement. So I’m not sure outer and inner alignment are very useful terms.
Outer alignment is (if you read a couple more sentences of the definition) not about “how to decide what we want”, but “how do we ensure that the reward/utility function we write down matches what we want”. So “Do What We Mean” is a magical-solution to the Outer Alignment problem, but if your AI then tells you “You-all don’t know what you mean” or “Which definition of ‘we’ did you mean?”, then you have a goalcraft problem.
I believe you’re correct that this distinction is useful. I believe the terms inner and outer alignment are already typically used in exactly the way you describe aimability and goalcraft.
These may have changed from the original intended meanings, and there are fuzzy boundaries between inner and outer alignment failures. But I believe they do the work you’re calling for, and are already commonly used.
First sentence of the tag inner alignment:
It goes on to discuss several mesa-optimization, one failure focused on in the original introduction of the term inner alignment, and several other modes of possible inner alignment failure.
First sentence of the tag outer alignment:
Outer alignment deals with the problem of matching a formally specified goal function in a computer with an intent in the designer’s mind, but this is not really Goalcrafting which asks what the goal should be.
E.g. Specification gaming is part of outer alignment, but not part of Goalcrafting.
I would classify inner and outer alignment as subcategories of Aimability.
I see that you’re correct. Thanks for the clarification. I’m embarrassed that I’ve been using it wrong.
Now I have no idea where the line between outer and inner alignment falls. It looks like a common point of disagreement. So I’m not sure outer and inner alignment are very useful terms.
Outer alignment is (if you read a couple more sentences of the definition) not about “how to decide what we want”, but “how do we ensure that the reward/utility function we write down matches what we want”. So “Do What We Mean” is a magical-solution to the Outer Alignment problem, but if your AI then tells you “You-all don’t know what you mean” or “Which definition of ‘we’ did you mean?”, then you have a goalcraft problem.