The counterfactual anti-mugging:
One day No-mega appears. No-mega is completely trustworthy etc. No-mega describes the counterfactual mugging to you, and predicts what you would have done in that situation not having met No-mega, if Omega had asked you for $100.
If you would have given Omega the $100, No-mega gives you nothing. If you would not have given Omega $100, No-mega gives you $10000. No-mega doesn’t ask you any questions or offer you any choices. Do you get the money? Would an ideal rationalist get the money?
Okay, next scenario: you have a magic box with a number p inscribed on it. When you open it, either No-mega comes out (probability p) and performs a counterfactual anti-mugging, or Omega comes out (probability 1-p), flips a fair coin and proceeds to either ask for $100, give you $10000, or give you nothing, as in the counterfactual mugging.
Before you open the box, you have a chance to precommit. What do you do?
If you would have given Omega the $100, No-mega gives you nothing. If you would not have given Omega $100, No-mega gives you $10000. No-mega doesn’t ask you any questions or offer you any choices. Do you get the money? Would an ideal rationalist get the money?
I would have no actionable suspicion that I should give Omega the $100 unless I knew about No-mega. So I get the $10000 only if No-mega asks the question “What would Eliezer do knowing about No-mega?” and not if No-mega asks the question “What would Eliezer do not knowing about No-mega?”
You forgot about MetaOmega, who gives you $10,000 if and only if No-mega wouldn’t have given you anything, and O-mega, who kills your family unless you’re an Alphabetic Decision Theorist. This comment doesn’t seem specifically anti-UDT—after all, Omega and No-mega are approximately equally likely to exist; a ratio of 1:1 if not an actual p of .5 -- but it still has the ring of Just Cheating. Admittedly, I don’t have any formal way of telling the difference between decision problems that feel more or less legitimate, but I think part of the answer might be that the Counterfactual Mugging isn’t really about how to act around superintelligences: It illustrates a more general need to condition our decisions based on counterfactuals, and as EY pointed out, UDT still wins the No-mega problem if you know about No-mega, so whether or not we should subscribe to some decision theory isn’t all that dependent on which superintelligences we encounter.
I’m necroing pretty hard and might be assuming too much about what Caspian originally meant, so the above is more me working this out for myself than anything else. But if anyone can explain why the No-mega problem feels like cheating to me, that would be appreciated.
Yes, that there can just as easily be a superintelligence that rewards people predicted to act one way as one that rewards people predicted to act the other. Which precommitment is most rational depends depends on the which type you expect to encounter.
I don’t expect to encounter either, and on the other hand I can’t rule out fallible human analogues of either. So for now I’m not precommitting either way.
You don’t precommit to “give away the $100, to anyone who asks”. You precommit to give away the $100 in exactly the situation I described. Or, generalizing such precommitments, you just compute your decisions on the spot, in a reflectively consistent fashion. If that’s what you want do to with your future self, that is.
there can just as easily be a superintelligence that rewards people predicted to act one way as one that rewards people predicted to act the other.
Yeah, now. But after Omega really, really, appears in front of you, chance of Omega existing is about 1. Chance of No-Mega is still almost non-existent. In this problem, existence of Omega is given. It’s not something you are expecting to encounter now, just as we’re not expecting to encounter eccentric Kavkan billionaires that will give you money for toxicating yourself. The Kavka’s Toxin and the counterfactual mugging present a scenario that is given, and ask you how would you act then.
But you aren’t supposed to be updating… the essence of UDT, I believe, is that your policy should be set NOW, and NEVER UPDATED.
So… either:
You consider the choice of policy based on the prior where you DIDN’T KNOW whether you’d face Nomega or Omega, and NEVER UPDATE IT (this seems obviously wrong to me: why are you using your old prior instead of your current posterior?).
or
You consider the choice of policy based on the prior where you KNOW that you are facing Omega AND that the coin is tails, in which case paying Omega only loses you money.
It doesn’t prevent doing different actions in different circumstances, though. That’s not what “updateless” means. It means that you should act as your past self would have precommitted to doing in your situation. Your probability estimate for “I see Omega” should be significantly greater than “I see Omega, and also Nomega is watching and deciding how to act”, so your decision should be mostly determined by Omega, not Nomega. (The Metanomega also applies—there’s a roughly equal chance of Metanomega or Nomega waiting and watching. [Metanomega = Nomega reversed; gives payoff iff predicts you paying.])
I see where I went wrong. I assumed that the impact of one’s response to Omega is limited to the number of worlds in which Omega exists. That is, my reasoning is invalid if (“what I do in scenario X” is meaningful and affects the world even if scenario X never happens). In other words, when one is being counterfactually modeled, which is exactly the topic of discussion.
Thanks for pointing that out. The answer is, as expected, a function of p.
So I now find explanations of why UDT gets mugged incomplete and misleading.
Here’s my analysis:
The action set is {give, don’t give}, which I’ll identify with {1, 0}.
Now, the possible deterministic policies are simply every mapping from {N,O} --> {1,0}, of which there are 4.
We can disregard the policies for which pi(N) = 1, since giving money to Nomega serves no purpose.
So we’re left with
pi_give
and
pi_don’t,
which give/don’t, respectively, to Omega.
Now, we can easily compute expected value, as follows:
r (pi_give(N)) = 0
r (pi_give(O, heads)) = 10
r (pi_give(0, tails)) = −1
r (pi_don’t(N)) = 10
r (pi_don’t(0)) = 0
So now:
Eg := E_give(r) = 0 p + .5 (10-1) * (1-p)
Ed := E_don’t(r) = 10 p + 0 (1-p)
Eg > Ed whenever 4.5 (1-p) > 10 p,
i.e. whenever 4.5 > 14.5 p
i.e. whenever 9⁄29 > p
So, whether you should precommit to being mugged depends on how likely you are to encounter N vs. O, which is intuitively obvious.
The counterfactual anti-mugging: One day No-mega appears. No-mega is completely trustworthy etc. No-mega describes the counterfactual mugging to you, and predicts what you would have done in that situation not having met No-mega, if Omega had asked you for $100.
If you would have given Omega the $100, No-mega gives you nothing. If you would not have given Omega $100, No-mega gives you $10000. No-mega doesn’t ask you any questions or offer you any choices. Do you get the money? Would an ideal rationalist get the money?
Okay, next scenario: you have a magic box with a number p inscribed on it. When you open it, either No-mega comes out (probability p) and performs a counterfactual anti-mugging, or Omega comes out (probability 1-p), flips a fair coin and proceeds to either ask for $100, give you $10000, or give you nothing, as in the counterfactual mugging.
Before you open the box, you have a chance to precommit. What do you do?
I would have no actionable suspicion that I should give Omega the $100 unless I knew about No-mega. So I get the $10000 only if No-mega asks the question “What would Eliezer do knowing about No-mega?” and not if No-mega asks the question “What would Eliezer do not knowing about No-mega?”
You forgot about MetaOmega, who gives you $10,000 if and only if No-mega wouldn’t have given you anything, and O-mega, who kills your family unless you’re an Alphabetic Decision Theorist. This comment doesn’t seem specifically anti-UDT—after all, Omega and No-mega are approximately equally likely to exist; a ratio of 1:1 if not an actual p of .5 -- but it still has the ring of Just Cheating. Admittedly, I don’t have any formal way of telling the difference between decision problems that feel more or less legitimate, but I think part of the answer might be that the Counterfactual Mugging isn’t really about how to act around superintelligences: It illustrates a more general need to condition our decisions based on counterfactuals, and as EY pointed out, UDT still wins the No-mega problem if you know about No-mega, so whether or not we should subscribe to some decision theory isn’t all that dependent on which superintelligences we encounter.
I’m necroing pretty hard and might be assuming too much about what Caspian originally meant, so the above is more me working this out for myself than anything else. But if anyone can explain why the No-mega problem feels like cheating to me, that would be appreciated.
Do you have a point?
Yes, that there can just as easily be a superintelligence that rewards people predicted to act one way as one that rewards people predicted to act the other. Which precommitment is most rational depends depends on the which type you expect to encounter.
I don’t expect to encounter either, and on the other hand I can’t rule out fallible human analogues of either. So for now I’m not precommitting either way.
You don’t precommit to “give away the $100, to anyone who asks”. You precommit to give away the $100 in exactly the situation I described. Or, generalizing such precommitments, you just compute your decisions on the spot, in a reflectively consistent fashion. If that’s what you want do to with your future self, that is.
Yeah, now. But after Omega really, really, appears in front of you, chance of Omega existing is about 1. Chance of No-Mega is still almost non-existent. In this problem, existence of Omega is given. It’s not something you are expecting to encounter now, just as we’re not expecting to encounter eccentric Kavkan billionaires that will give you money for toxicating yourself. The Kavka’s Toxin and the counterfactual mugging present a scenario that is given, and ask you how would you act then.
But you aren’t supposed to be updating… the essence of UDT, I believe, is that your policy should be set NOW, and NEVER UPDATED.
So… either:
You consider the choice of policy based on the prior where you DIDN’T KNOW whether you’d face Nomega or Omega, and NEVER UPDATE IT (this seems obviously wrong to me: why are you using your old prior instead of your current posterior?). or
You consider the choice of policy based on the prior where you KNOW that you are facing Omega AND that the coin is tails, in which case paying Omega only loses you money.
It doesn’t prevent doing different actions in different circumstances, though. That’s not what “updateless” means. It means that you should act as your past self would have precommitted to doing in your situation. Your probability estimate for “I see Omega” should be significantly greater than “I see Omega, and also Nomega is watching and deciding how to act”, so your decision should be mostly determined by Omega, not Nomega. (The Metanomega also applies—there’s a roughly equal chance of Metanomega or Nomega waiting and watching. [Metanomega = Nomega reversed; gives payoff iff predicts you paying.])
I see where I went wrong. I assumed that the impact of one’s response to Omega is limited to the number of worlds in which Omega exists. That is, my reasoning is invalid if (“what I do in scenario X” is meaningful and affects the world even if scenario X never happens). In other words, when one is being counterfactually modeled, which is exactly the topic of discussion.
Thanks for pointing that out. The answer is, as expected, a function of p. So I now find explanations of why UDT gets mugged incomplete and misleading.
Here’s my analysis:
The action set is {give, don’t give}, which I’ll identify with {1, 0}. Now, the possible deterministic policies are simply every mapping from {N,O} --> {1,0}, of which there are 4.
We can disregard the policies for which pi(N) = 1, since giving money to Nomega serves no purpose. So we’re left with
pi_give
and
pi_don’t,
which give/don’t, respectively, to Omega.
Now, we can easily compute expected value, as follows:
r (pi_give(N)) = 0
r (pi_give(O, heads)) = 10
r (pi_give(0, tails)) = −1
r (pi_don’t(N)) = 10
r (pi_don’t(0)) = 0
So now:
Eg := E_give(r) = 0 p + .5 (10-1) * (1-p)
Ed := E_don’t(r) = 10 p + 0 (1-p)
Eg > Ed whenever 4.5 (1-p) > 10 p,
i.e. whenever 4.5 > 14.5 p
i.e. whenever 9⁄29 > p
So, whether you should precommit to being mugged depends on how likely you are to encounter N vs. O, which is intuitively obvious.