Seth Herd comments on If we solve alignment, do we die anyway?

Seth Herd 25 Aug 2024 0:26 UTC
4 points
2
This is true; value alignment is quite possible. But if it’s both harder/less safe, and people would rather align their godling with their own values/commands, I think we should either expect this or make very strong arguments against it.
- Vladimir_Nesov 25 Aug 2024 1:27 UTC
  5 points
  1
  Parent
  Respect for autonomy is not quite value alignment, just as corrigibility is not quite alignment. I’m pointing out that it might be possible to get a good outcome out of strong optimization without value alignment, because strong optimization can be sensitive to context of the past and so doesn’t naturally result in a past-insensitive tiling of the universe according to its values. Mostly it’s a thought experiment investigating some intuitions about what strong optimization has to be like, and thus importance and difficulty of targeting it precisely at particular values.
  
  Not being a likely outcome is a separate issue, for example I don’t expect intent alignment in its undifferentiated form to remain secure enough to contain AI-originating agency. To the extent intent alignment grants arbitrary wishes, what I describe is an ingredient of a possible wish, one that’s distinct from value alignment and sidesteps the question of “alignment to whom” in a way different from both CEV and corrigibility. It’s not more clearly specified than CEV either, but it’s distinct from it.
  - Seth Herd 25 Aug 2024 19:32 UTC
    4 points
    2
    Parent
    In your use of respect for autonomy as a goal:; are you referring to something like Empowerment is (almost) All We Need? I do find that to be an appealing alignment target (I think I’m using alignment slightly more broadly, as in Hubinger’s definition. (I have a post in progress on the terminology of different alignment/goal targets and resulting confusions).
    The problem with empowerment as an ASI goal is, once again: empowering whom? And do you empower them to make more like them that you then have to empower? Roger Dearnaley notes that if we empower everyone, humans will probably lose out to either something with less volition but using fewer resources, like insects, or something with more volition to empower, like other ASIs. Do we reallly want to limit the future to baseline humans? And how do we handle humans that want to create tons more humans?
    See 4. A Moral Case for Evolved-Sapience-Chauvinism and 5. Moral Value for Sentient Animals? Alas, Not Yet from Roger’s AI, Alignment, and Ethics sequence.
    I actually do expect intent alignment to remain secure enough to contain AI-originating agency, as long as it’s the primary goal or “’singular target”. It’s counterintuitive that a superintelligent being could want nothing more than to do what its principal wants it to do, but I think it’s coherent. And the more competent it gets, the better it will be at doing what you want and nothing more. Before it’s that competent, the principal can give more careful instructions, including instructions to check before acting, and to help with its alignment in various ways.
    I agree that respect for autonomy/empowerment is one instruction/intent you could give. I do expect that someone will turn their intent-aligned AGI into an autonomous AGI at some point; hopefully after they’re quite confident in its alignment and the worth of that goal.
    - Vladimir_Nesov 27 Aug 2024 4:55 UTC
      2 points
      0
      Parent
      Respect for autonomy is not quite empowerment, it’s more like being left alone. The use of this concept is more in defining what it means for an agent or a civilization to develop relatively undisturbed, without getting overwritten by external influence, not in considering ways of helping it develop. So it’s also a building block for defining extrapolated volition, because that involves extended period of not getting destroyed by external influences. But it’s conceptually prior to extrapolated volition, it doesn’t depend on already knowing what it is, it’s a simpler notion.
      
      It’s not by itself a good singular target to set an AI to pursue, for example it doesn’t protect humans from building more extinction-worthy AIs within their membranes, and doesn’t facilitate any sort of empowerment. But it seems simple enough and agreeable as a universal norm to be a plausible aspect of many naturally developing AI goals, and it doesn’t require absence of interaction, so allows empowerment etc. if that is also something others provide.