Dagon comments on Dagon’s Shortform

Dagon 2 Feb 2024 17:10 UTC
10 points
0
I’m a bit uncomfortable with [edit: wrong word. all discussion is good, but I think there’s a big modeling error being made.] a lot of the discussion about ethics/morals around here, in that it’s often treated as a separate, incomparable domain to other motives/values/utility sources.
It may be, in humans and/or in AI, that there are fairly distinct modules for different domains, which somehow bid/negotiate for which of them influences a given decision by how much. This would make the “morals as distinct evaluations” approach reasonable, though the actual negotiation and power weights among modules seem more important to study.
But if something approaches VNM-rationality, it acts as if it has a unified consistent utility function. Which means an integrated set of values and motivations, not separate models of “what’s the right action in this context for me”.
So, either AI is irrational (like humans), or exploration of morality as a separate topic from status, pleasure, family, or other non-altruistic drives is not very applicable.
- the gears to ascension 3 Feb 2024 5:33 UTC
  2 points
  0
  Parent
  I’m pretty sure I mostly agree. However:
  
  It is not obvious that vnm rationality is quite the right thing here. I talked to a game theorist dude at an event a while back and he said of the core assumptions of vnm rationality there’s one not everyone accepts, but I’ve had trouble re-locating what it was he said. In any case, garrabrant said he doesn’t accept vnm rationality for related but iirc slightly-different-maybe-not-sure reasons. But iirc it doesn’t completely invalidate combining everything into a single ranking function.
  
  It might make sense to demand that morality is clearly spelled out in one of the theorems that constrain what utility functions we’d accept, though. That’s a little different from it being truly separate.
  - Dagon 3 Feb 2024 17:15 UTC
    2 points
    0
    Parent
    I concede that VNM may be too high a bar, and I’ve seen some clever things done by relaxing the independence axiom. I don’t think I’ve seen any that change the basic problem of (acting as if there were) a single value function that the agent is optimizing. To be rational, there can be no separation of moral and non-moral domains.