Being nice because you’re altruistic, and being even nicer for decision-theoretic reasons on top of that, seems like it involves some kind of double-counting: the reason you’re altruistic in the first place is because evolution ingrained the decision theory into your values.
But it’s not fully double-counting: many humans generalise altruism in a way which leads them to “cooperate” far more than is decision-theoretically rational for the selfish parts of them—e.g. by making big sacrifices for animals, future people, etc. I guess this could be selfishly rational if you subscribe to a very strong form of updatelessness, but I am very skeptical that we’ll discover arguments that this much updatelessness is rationally obligatory.
A very speculative takeaway: maybe “how updateless you are” and “how altruistic you are” are kinda measuring the same thing, and there’s no clean split between whether that’s determined by your values or your decision theory.
Your actions and decisions are not doubled. If you have multiple paths to arrive at the same behaviors, that doesn’t make them wrong or double-counted, it just makes it hard to tell which of them is causal (aka: your behavior is overdetermined).
Are you using “updatelessness” to refer to not having self in your utility function? If so, that’s a new one one me, and I’d prefer “altruism” as the term. I’m not sure that the decision-theory use of “updateless” (to avoid incorrect predictions where experience is correlated with the question at hand) makes sense here.
Yeah, I don’t think “double counted” is the right term here.
Consider, if just personally like the taste of kale, I’ll eat kale. If I also find out that kale is especially healthy, I have additional reason to eat kale, compared to alternatives that are not as healthy.
Surely there’s some correlation and causal relationship between what I find tasty and what is healthy. Nutrition (and avoiding poisoning) is the main reason taste evolved!
But that doesn’t mean that taste and health aren’t separate reasons for me to prefer to eat kale.
Oh, this also suggests a way in which the utility function abstraction is leaky, because the reasons for the payoffs in a game may matter. E.g. if one payoff is high because the corresponding agent is altruistic, then in some sense that agent is “already cooperating” in a way which is baked into the game, and so the rational thing for them to do might be different from the rational thing for another agent who gets the same payoffs, but for “selfish” reasons.
Maybe FDT already lumps this effect into the “how correlated are decisions” bucket? Idk.
Being nice because you’re altruistic, and being even nicer for decision-theoretic reasons on top of that, seems like it involves some kind of double-counting: the reason you’re altruistic in the first place is because evolution ingrained the decision theory into your values.
But it’s not fully double-counting: many humans generalise altruism in a way which leads them to “cooperate” far more than is decision-theoretically rational for the selfish parts of them—e.g. by making big sacrifices for animals, future people, etc. I guess this could be selfishly rational if you subscribe to a very strong form of updatelessness, but I am very skeptical that we’ll discover arguments that this much updatelessness is rationally obligatory.
A very speculative takeaway: maybe “how updateless you are” and “how altruistic you are” are kinda measuring the same thing, and there’s no clean split between whether that’s determined by your values or your decision theory.
Your actions and decisions are not doubled. If you have multiple paths to arrive at the same behaviors, that doesn’t make them wrong or double-counted, it just makes it hard to tell which of them is causal (aka: your behavior is overdetermined).
Are you using “updatelessness” to refer to not having self in your utility function? If so, that’s a new one one me, and I’d prefer “altruism” as the term. I’m not sure that the decision-theory use of “updateless” (to avoid incorrect predictions where experience is correlated with the question at hand) makes sense here.
Yeah, I don’t think “double counted” is the right term here.
Consider, if just personally like the taste of kale, I’ll eat kale. If I also find out that kale is especially healthy, I have additional reason to eat kale, compared to alternatives that are not as healthy.
Surely there’s some correlation and causal relationship between what I find tasty and what is healthy. Nutrition (and avoiding poisoning) is the main reason taste evolved!
But that doesn’t mean that taste and health aren’t separate reasons for me to prefer to eat kale.
Oh, this also suggests a way in which the utility function abstraction is leaky, because the reasons for the payoffs in a game may matter. E.g. if one payoff is high because the corresponding agent is altruistic, then in some sense that agent is “already cooperating” in a way which is baked into the game, and so the rational thing for them to do might be different from the rational thing for another agent who gets the same payoffs, but for “selfish” reasons.
Maybe FDT already lumps this effect into the “how correlated are decisions” bucket? Idk.
Another double-counting: wanting for people to be saved for altruistic reasons and wanting to personally do things that save people.