David Scott Krueger (formerly: capybaralet) comments on Counterfactual Mugging

David Scott Krueger (formerly: capybaralet) 30 Jan 2017 18:08 UTC
0 points
Thanks for pointing that out. The answer is, as expected, a function of p. So I now find explanations of why UDT gets mugged incomplete and misleading.

Here’s my analysis:

The action set is {give, don’t give}, which I’ll identify with {1, 0}. Now, the possible deterministic policies are simply every mapping from {N,O} --> {1,0}, of which there are 4.

We can disregard the policies for which pi(N) = 1, since giving money to Nomega serves no purpose. So we’re left with

pi_give

and

pi_don’t,

which give/don’t, respectively, to Omega.

Now, we can easily compute expected value, as follows:

r (pi_give(N)) = 0

r (pi_give(O, heads)) = 10

r (pi_give(0, tails)) = −1

r (pi_don’t(N)) = 10

r (pi_don’t(0)) = 0

So now:

Eg := E_give(r) = 0 p + .5 (10-1) * (1-p)

Ed := E_don’t(r) = 10 p + 0 (1-p)

Eg > Ed whenever 4.5 (1-p) > 10 p,

i.e. whenever 4.5 > 14.5 p

i.e. whenever ⁹⁄₂₉ > p

So, whether you should precommit to being mugged depends on how likely you are to encounter N vs. O, which is intuitively obvious.
What links here?
- Does UDT *really* get counter-factually mugged? by IAFF-User-111 (4 Feb 2017 21:46 UTC; 0 points)