Richard_Ngo comments on Richard Ngo’s Shortform

Richard_Ngo Feb 2, 2022, 1:38 AM
8 points
Being nice because you’re altruistic, and being even nicer for decision-theoretic reasons on top of that, seems like it involves some kind of double-counting: the reason you’re altruistic in the first place is because evolution ingrained the decision theory into your values.
But it’s not fully double-counting: many humans generalise altruism in a way which leads them to “cooperate” far more than is decision-theoretically rational for the selfish parts of them—e.g. by making big sacrifices for animals, future people, etc. I guess this could be selfishly rational if you subscribe to a very strong form of updatelessness, but I am very skeptical that we’ll discover arguments that this much updatelessness is rationally obligatory.
A very speculative takeaway: maybe “how updateless you are” and “how altruistic you are” are kinda measuring the same thing, and there’s no clean split between whether that’s determined by your values or your decision theory.
- Dagon Feb 3, 2022, 3:56 AM
  4 points
  Parent
  Your actions and decisions are not doubled. If you have multiple paths to arrive at the same behaviors, that doesn’t make them wrong or double-counted, it just makes it hard to tell which of them is causal (aka: your behavior is overdetermined).
  Are you using “updatelessness” to refer to not having self in your utility function? If so, that’s a new one one me, and I’d prefer “altruism” as the term. I’m not sure that the decision-theory use of “updateless” (to avoid incorrect predictions where experience is correlated with the question at hand) makes sense here.
- Richard_Ngo Feb 2, 2022, 2:03 AM
  2 points
  Parent
  Oh, this also suggests a way in which the utility function abstraction is leaky, because the reasons for the payoffs in a game may matter. E.g. if one payoff is high because the corresponding agent is altruistic, then in some sense that agent is “already cooperating” in a way which is baked into the game, and so the rational thing for them to do might be different from the rational thing for another agent who gets the same payoffs, but for “selfish” reasons.
  Maybe FDT already lumps this effect into the “how correlated are decisions” bucket? Idk.