lackofcheese comments on Anthropic decision theory for selfish agents

lackofcheese 24 Oct 2014 2:40 UTC
4 points
Having established the nature of the different utility functions, it’s pretty simple to show how the gnomes relate to these. The first key point to make, though, is that there are actually two distinct types of submissive gnomes and it’s important not to confuse the two. This is part of the reason for the confusion over Beluga’s post.
Submissive gnome: I adopt the utility function of any human in my cell, but am completely indifferent otherwise.
Pre-emptively submissive gnome: I adopt the utility function of any human in my cell; if there is no human in my cell I adopt the utility function they would have had if they were here.

The two are different precisely in the key case that Stuart mentioned—the case where there is no human at all in the gnome’s cell. Fortunately, the utility function of the human who will be in the gnome’s cell (which we’ll call “cell B”) is entirely well-defined, because any existing human in the same cell will always end up with the same utility function. The “would have had” case for the pre-emptively submissive gnomes is a little stranger, but it still makes sense—the gnome’s utility would correspond to the anti-indexical component JU of the human’s utility function U (which, for selfish humans, is just zero). Thus we can actually remove all of the dangling references in the gnome’s utility function, as per the discussion between Stuart and Beluga. If U is the utility function the human in cell B has (or would have), then the submissive gnome’s utility function is IU (note the indexicalisation!) whereas the pre-emptively submissive gnome’s utility function is simply U.

Following Beluga’s post here, we can use these ideas to translate all of the various utility functions to make them completely objective and observer-independent, although some of them reference cell B specifically. If we refer to the second cell as “cell C”, swapping between the two gnomes is equivalent to swapping B and C. For further simplification, we use $(B) to refer to the number of dollars in cell B, and o(B) as an indicator function for whether the cell has a human in it. The simplified utility functions are thus
T = $B + $C
A = ($B + $C) / (o(B) + o(C))
S = IS = $B
IT = o(B) ($B + $C)
IA = o(B) ($B + $C) / (o(B) + o(C))
Z = - $C
H = $B - $C
IH = o(B) ($B - $C)
Note that T and A are the only functions that are invariant under swapping B and C.

This invariance means that, for both cases involving utilitarian humans and pre-emptively submissive gnomes, all of the gnomes (including the one in an empty cell) and all of the humans have the same utility function over all possible worlds. Moreover, all of the decisions are obviously linked, and so there is effectively only one decision. Consequently, it’s quite trivial to solve with UDT. Total utilitarianism gives
E[T] = 0.5(-x) + 2*0.5(1-x) = 1-1.5x
with breakeven at x = ²⁄₃, and average utilitarianism gives
E[A] = 0.5(-x) + 0.5(1-x) = 0.5-x
with breakeven at x = ¹⁄₂.

In the selfish case, the gnome ends up with the same utility function whether it’s pre-emptive or not, because IS = S. Also, there is no need to worry about decision linkage, and hence the decision problem is a trivial one. From the gnome’s point of view, ¹⁄₄ of the time there will be no human in the cell, ¹⁄₂ of time there will be a human in the cell and the coin will have come up tails, and ¹⁄₄ of the time there will be a human in the cell and the coin will have come up heads. Thus
E[S] = 0.25(0) + 0.25(-x) + 0.5(1-x) = 0.5-0.75x
and the breakeven point is x = ²⁄₃, as with the total utilitarian case.

In all of these cases so far, I think the humans quite clearly should follow the advice of the gnomes, because
1) Their utility functions coincide exactly over all a priori possible worlds.
2) The humans do not have any extra information that the gnomes do not.

Now, finally, let’s go over the reasoning that leads to the so-called “incorrect” answers of ⁴⁄₅ and ²⁄₃ for total and average utilitarianism. We assume, as before, that the decisions are linked. As per Beluga’s post, the argument goes like this:

With probability ²⁄₃, the coin has shown tails. For an average utilitarian, the expected utility after paying x$ for a ticket is 1/3*(-x)+2/3*(1-x), while for a total utilitarian the expected utility is 1/3*(-x)+2/3*2*(1-x). Average and total utilitarians should thus pay up to 2/3$ and 4/5$, respectively.

So, what’s the problem with this argument? In actual fact, for a submissive gnome, that advice is correct, but the human should not follow it. The problem is that a submissive gnome’s utility function doesn’t coincide with the utility function of the human over all possible worlds, because IT != T and IA != A. The key difference between the two cases is the gnome in the empty cell. If it’s a submissive gnome, then it’s completely indifferent to the plight of the humans; if it’s a pre-emptively submissive gnome then it still cares.

If we were to do the full calculations for the submissive gnome, the gnome’s utility function is IT for total utilitarian humans and IA for average utilitariam humans; since IIT = IT and IIA = IA the calculations are the same if the humans have indexical utility functions. For IT we get
E[IT] = 0.25(0) + 0.25(-x) + 2*0.5(1-x) = 1-1.25x
with breakeven at x = ⁴⁄₅, and for IA we get
E[IA] = 0.25(0) + 0.25(-x) + 0.5(1-x) = 0.5-0.75x
with breakeven at x = ²⁄₃. Thus the submissive gnome’s ²⁄₃ and ⁴⁄₅ numbers are correct for the gnome, and indeed if the human’s total/average utilitarianism is indexical they should just follow the advice, because their utility function would then be identical to the gnome’s.

So, if this advice is correct for the submissive gnome, why should the pre-emptive submissive gnome’s advice be different? After all, after conditioning on the presence of a human in the cell the two utility functions are the same. This particular issue is indeed exactly analogous to the mistaken “yea” answer in Psy-Kosh’s non-anthropic problem. Although I side with UDT and/or the precommitment-based reasoning, I think that question warrants further discussion, so I’ll leave that for a third comment.
What links here?
- “Solving” selfishness for UDT by Stuart_Armstrong (27 Oct 2014 17:51 UTC; 39 points)
- lackofcheese's comment on Anthropic decision theory for selfish agents by Beluga (25 Oct 2014 3:53 UTC; 2 points)
- lackofcheese 24 Oct 2014 15:52 UTC
  2 points
  Parent
  OK, time for further detail on the problem with pre-emptively submissive gnomes. Let’s focus on the case of total utilitarianism, and begin by looking at the decision in unlinked form, i.e. we assume that the gnome’s advice affects only one human if there is one in the room, and zero humans otherwise. Conditional on there being a human in cell B, the expected utility of the human in cell B buying a ticket for $x is, indeed, (1/3)(-x) + (2/3)(1-x) = ²⁄₃ - x, so the breakeven is obviously at x = ²⁄₃. However, if we also assume that the gnome in the other cell will give the same advice, we get (1/3)(-x) + 2(2/3)(1-x) = ⁴⁄₃ - (5/3)x, with breakeven at x=4/5. In actual fact, the gnome’s reasoning, and the ⁴⁄₅ answer, is correct. If tickets were being offered at a price of, say, 75 cents, then the overall outcome (conditional on there being a human in cell B) is indeed better if the humans buy at 75 cents than if they refuse to buy at 75 cents, because ³⁄₄ is less than ⁴⁄₅.
  
  As I mentioned previously, in the case where the gnome only cares about total $ if there is a human in its cell, then ⁴⁄₅ is correct before conditioning on the presence of a human, and it’s also correct after conditioning on the presence of a human; the number is ⁴⁄₅ regardless. However, the situation we’re examining here is different, because the gnome cares about total $ even if no human is present. Thus we have a dilemma, because it appears that UDT is correct in advising the gnome to precommit to ²⁄₃, but the above argument also suggests that after seeing a human in its cell it is correct for the gnome to advise ⁴⁄₅.
  
  The key distinction, analogously to mwenger’s answer to Psy-Kosh’s non-anthropic problem, has to do with the possibility of a gnome in an empty cell. For a total utilitarian gnome in an empty cell, any money at all spent in the other cell translates directly into negative utility. That gnome would prefer the human in the other cell to spend $0 at most, but of course there is no way to make this happen, since the other gnome has no way of knowing that this is the case.
  
  The resolution to this problem is that, for linked decisions, you must (as UDT does) necessarily consider the effects of that decision over all a priori possible worlds affected by that decision. As it happens, this is the same thing as what you would do if you had the opportunity to precommit in advance.
  
  It’s a bit trickier to justify why this should be the case, but the best argument I can come up with is to apply that same “linked decision” reasoning at one meta-level up, the level of “linked decision theories”. In short, by adopting a decision theory that ignores linked decisions in a priori possible worlds that are excluded by your observations, you are licensing yourself and other agents to do the same thing in future decisions, which you don’t want. If other agents follow this reasoning, they will give the “yea” answer in Psy-Kosh’s non-anthropic problem, but you don’t want them to do that.
  
  Note that most of the time, decisions in worlds excluded by your observations do not usually tend to be “linked”. This is because exclusion by observation would usually imply that you receive a different observation in the other possible world, thus allowing you to condition your decision on that observation, and thereby unlinking the decisions. However, some rare problems like the Counterfactual Mugging and Psy-Kosh’s non-anthropic problem violate this tendency, and should therefore be treated differently.
  
  Overall, then, the “linked decision theory” argument supports adopting UDT, and it means that you should consider all linked decisions in all a priori possible worlds.
  - Beluga 25 Oct 2014 17:45 UTC
    1 point
    Parent
    Thanks a lot for your comments, they were very insightful for me. Let me play the Advocatus Diaboli here and argue from the perspective of a selfish agent against your reasoning (and thus also against my own, less refined version of it).
    
    “I object to the identification ‘S = $B’. I do not care about the money owned by the person in cell B, I only do so if that person is me. I do not know whether the coin has come up heads or tails, but I do not care about how much money the other person that may have been in cell B had the coin come up differently would have paid or won. I only care about the money owned by the person in cell B in “this world”, where that person is me. I reject identifying myself with the other person that may have been in cell B had the coin come up differently, solely because that person would exist in the same cell as I do. My utility function thus cannot be expressed as a linear combination of $B and $C.
    
    I would pay a counterfactual mugger. In that case, there is a transfer, as it were, between two possible selfes of mine that increases “our” total fortune. We are both both possible descendants of the same past-self, to which each of us is connected identically. The situation is quite different in the incubator case. There is no connection over a mutual past self between me and the other person that may have existed in cell B after a different outcome of the coin flip. This connection between past and future selves of mine is exactly what specifies my selfish goals. Actually, I don’t feel like the person that may have existed in cell B after a different outcome of the coin flip is “me” any more than the person in cell C is “me” (if that person exists). Since I will pay and win as much as the person in cell C (if they exist), I cannot win any money from them, and I don’t care about whether they exist at all, I think I should decide as an average utilitarian would. I will not pay more than $0.50.”
    
    Is the egoist arguing this way mistaken? Or is our everyday notion of selfishness just not uniquely defined when it comes to the possibility of subjectively indistinguishable agents living in different “worlds”, since it rests on the dubious concept of personal identity? Can one understand selfishness both as caring about everyone living in subjectively identical circumstances as oneself (and their future selves), and as caring about everyone to whom one is directly connected only? Do these two possibilities correspond to SIA-egoists and SSA-egoists, respectively, which are both coherent possibilities?
    - lackofcheese 26 Oct 2014 13:41 UTC
      0 points
      Parent
      First of all, I think your argument from connection of past/future selves is just a specific case of the more general argument for reflective consistency, and thus does not imply any kind of “selfishness” in and of itself. More detail is needed to specify a notion of selfishness.
      
      I understand your argument against identifying yourself with another person who might counterfactually have been in the same cell, but the problem here is that if you don’t know how the coin actually came up you still have to assign amounts of “care” to the possible selves that you could actually be.
      
      Let’s say that, as in my reasoning above, there are two cells, B and C; when the coin comes up tails humans are created in both cell B and cell C, but when the coin comes up heads a human is created in either cell B or cell C, with equal probability. Thus there are 3 “possible worlds”:
      1) p=1/2 human in both cells
      2) p=1/4 human in cell B, cell C empty
      3) p=1/4 human in cell C, cell B empty
      
      If you’re a selfish human and you know you’re in cell B, then you don’t care about world (3) at all, because there is no “you” in it. However, you still don’t know whether you’re in world (1) or (2), so you still have to “care” about both worlds. Moreover, in either world the “you” you care about is clearly the person in cell B, and so I think the only utility function that makes sense is S = $B. If you want to think about it in terms of either SSA-like or SIA-like assumptions, you get the same answer because both in world (1) and world (2) there is only a single observer who could be identified as “you”.
      
      Now, what if you didn’t know whether you were in cell B or cell C? That’s where things are a little different. In that case, there are two observers in world (1), either of whom could be “you”. There are basically two different ways of assigning utility over the two different “yous” in world (1)---adding them together, like a total utilitarian, and averaging them, like an average utilitarian; the resulting values are x=2/3 and x=1/2 respectively. Moreover, the first approach is equivalent to SIA, and the second is equivalent to SSA.
      
      However, the SSA answer has a property that none of the others do. If the gnome was to tell the human “you’re in cell B”, an SSA-using human would change their cutoff point from ¹⁄₂ to ²⁄₃. This seems to be rather strange indeed, because whether the human is in cell B or in cell C is not in any way relevant to the payoff. No human with any of the other utility functions we’ve considered would change his/her answer upon being told that they are in cell B.
  - Lumifer 24 Oct 2014 16:18 UTC
    −1 points
    Parent
    
    time for further detail on the problem with pre-emptively submissive gnomes.
    
    One of the aspects of what makes LW what it is—people with serious expressions on their faces discuss the problems with pre-emptively submissive gnomes and nobody blinks an eye X-D
    - lackofcheese 24 Oct 2014 16:27 UTC
      0 points
      Parent
      I guess your comment means that you must have blinked an eye, so your comment can’t be completely true. That said, as discussions of pre-emptively submissive gnomes go, I would generally expect the amount of eye-blinking on LW to be well below average ^_~
      - Lumifer 24 Oct 2014 16:51 UTC
        −1 points
        Parent
        
        I guess your comment means that you must have blinked an eye
        
        I arched my eyebrow :-P
- Stuart_Armstrong 24 Oct 2014 10:34 UTC
  1 point
  Parent
  I like your analysis. Interestingly, the gnomes advise in the T and A cases for completely different reasons than in the S case.
  
  But let me modify the case slightly: now the gnomes adopt the utility function of the closest human. This makes no difference to the T and A cases. But now in the S case, the gnomes have a linked decision, and
  
  E[S] = 0.25(-x) + 0.25(-x) + 0.5(1-x) = 0.5-x
  
  This also seems to satisfy “1) Their utility functions coincide exactly over all a priori possible worlds. 2) The humans do not have any extra information that the gnomes do not.” Also, the gnomes are now deciding the T, A and S cases for the same reasons (linked decisions).
  - lackofcheese 24 Oct 2014 11:16 UTC
    3 points
    Parent
    I don’t think that works, because 1) isn’t actually satisfied. The selfish human in cell B is indifferent over worlds where that same human doesn’t exist, but the gnome is not indifferent.
    
    Consequently, I think that as one of the humans in your “closest human” case you shouldn’t follow the gnome’s advice, because the gnome’s recommendation is being influenced by a priori possible worlds that you don’t care about at all. This is the same reason a human with utility function T shouldn’t follow the gnome recommendation of ⁴⁄₅ from a gnome with utility function IT. Even though these recommendations are correct for the gnomes, they aren’t correct for the humans.
    
    As for the “same reasons” comment, I think that doesn’t hold up either. The decisions in all of the cases are linked decisions, even in the simple case of U = S above. The difference in the S case is simply that the linked nature of the decision turns out to be irrelevant, because the other gnome’s decision has no effect on the first gnome’s utility. I would argue that the gnomes in all of the cases we’ve put forth have always had the “same reasons” in the sense that they’ve always been using the same decision algorithm, albeit with different utility functions.
    - Stuart_Armstrong 24 Oct 2014 11:55 UTC
      2 points
      Parent
      Let’s ditch the gnomes, they are contributing little to this argument.
      
      My average ut=selfish argument was based on the fact that if you changed the utility of everyone who existed from one system to the other, then people’s utilities would be the same, given that they existed.
      
      The argument here is that if you changed the utility of everyone from one system to the other, then this would affect their counterfactual utility in the worlds where they don’t exist.
      
      That seems… interesting. I’ll reflect further.
      What links here?
      “Solving” selfishness for UDT by Stuart_Armstrong (27 Oct 2014 17:51 UTC; 39 points)
      - lackofcheese 24 Oct 2014 14:38 UTC
        3 points
        Parent
        Yep, I think that’s a good summary. UDT-like reasoning depends on the utility values of counterfactual worlds, not just real ones.
        Stuart_Armstrong 24 Oct 2014 15:02 UTC
        2 points
        Parent
        I’m starting to think this is another version of the problem of personal identity… But I want to be thorough before posting anything more.
    - Stuart_Armstrong 24 Oct 2014 11:41 UTC
      1 point
      Parent
      I think I’m starting to see the argument...