Seth Herd comments on “AI Alignment” is a Dangerously Overloaded Term

Seth Herd 15 Dec 2023 18:46 UTC
4 points
−2
I believe you’re correct that this distinction is useful. I believe the terms inner and outer alignment are already typically used in exactly the way you describe aimability and goalcraft.
These may have changed from the original intended meanings, and there are fuzzy boundaries between inner and outer alignment failures. But I believe they do the work you’re calling for, and are already commonly used.
First sentence of the tag inner alignment:
Inner alignment asks the question: How can we robustly aim our AI optimizers at any objective function at all?
It goes on to discuss several mesa-optimization, one failure focused on in the original introduction of the term inner alignment, and several other modes of possible inner alignment failure.
First sentence of the tag outer alignment:
Outer alignment asks the question—“What should we aim our model at?”
- Roko 15 Dec 2023 19:50 UTC
  4 points
  2
  Parent
  
  I believe the terms inner and outer alignment are already typically used in exactly the way you describe Aimability and Goalcraft.
  
  outer alignment as a problem is intuitive enough to understand, i.e., is the specified loss function aligned with the intended goal of its designers?
  
  Outer alignment deals with the problem of matching a formally specified goal function in a computer with an intent in the designer’s mind, but this is not really Goalcrafting which asks what the goal should be.
  
  E.g. Specification gaming is part of outer alignment, but not part of Goalcrafting.
  
  I would classify inner and outer alignment as subcategories of Aimability.
  - Seth Herd 16 Dec 2023 22:50 UTC
    2 points
    0
    Parent
    I see that you’re correct. Thanks for the clarification. I’m embarrassed that I’ve been using it wrong.
    
    Now I have no idea where the line between outer and inner alignment falls. It looks like a common point of disagreement. So I’m not sure outer and inner alignment are very useful terms.
- RogerDearnaley 18 Dec 2023 11:24 UTC
  1 point
  0
  Parent
  Outer alignment is (if you read a couple more sentences of the definition) not about “how to decide what we want”, but “how do we ensure that the reward/utility function we write down matches what we want”. So “Do What We Mean” is a magical-solution to the Outer Alignment problem, but if your AI then tells you “You-all don’t know what you mean” or “Which definition of ‘we’ did you mean?”, then you have a goalcraft problem.