gb comments on Alignment by default: the simulation hypothesis

gb 4 Oct 2024 15:28 UTC
−2 points
−3
To be rewarded (and even more so “maximally rewarded”) is to be given something you actually want (and the reverse for being punished). That’s the definition of what a reward/punishment is. You don’t “choose” to want/not want it, any more than you “choose” your utility function. It just is what it is. Being “rewarded” with something you don’t want is a contradiction in terms: at best someone tried to reward you, but that attempt failed.
- green_leaf 6 Oct 2024 14:01 UTC
  1 point
  0
  Parent
  I see your argument. You are saying that “maximal reward”, by definition, is something that gives us the maximum utility from all possible actions, and so, by definition, it is our purpose in life.
  But actually, utility is a function of both the action (getting two golden bricks) and what it rewards (murdering my child), not merely a function of the action itself (getting two golden bricks).
  And so it happens that for many possible demands that I could be given (“you have to murder your child”), there are no possible rewards that would give me more utility than not obeying the command.
  For that reason, simply because someone will maximally reward me for obeying them doesn’t make their commands my objective purpose in life.
  Of course, we can respond “but then, by definition, they aren’t maximally rewarding you” and by that definition, it would be a correct statement to make. The problem here is that the set of all possible commands for which I can’t (by that definition) be maximally rewarded is so vast that the statement “if someone maximally rewards/punishes you, their orders are your purpose of life” becomes meaningless.
  - gb 6 Oct 2024 18:03 UTC
    0 points
    −1
    Parent
    The problem here is that the set of all possible commands for which I can’t (by that definition) be maximally rewarded is so vast that the statement “if someone maximally rewards/punishes you, their orders are your purpose of life” becomes meaningless.
    Not true, as the reward could include all of the unwanted consequences of following the command being divinely reverted a fraction of a second later.
    - green_leaf 7 Oct 2024 19:00 UTC
      1 point
      0
      Parent
      That wouldn’t help. Then the utility would be calculated from (getting two golden bricks) and (murdering my child for a fraction of a second), which still brings lower utility than not following the command.
      The set of possible commands for which I can’t be maximally rewarded still remains too vast for the statement to be meaningful.
      - gb 7 Oct 2024 19:16 UTC
        0 points
        −1
        Parent
        This sounds absurd to me. Unless of course you’re taking the “two golden bricks” literally, in which case I invite you to substitute it by “saving 1 billion other lives” and seeing if your position still stands.