Thanks for pointing that out. The answer is, as expected, a function of p.
So I now find explanations of why UDT gets mugged incomplete and misleading.
Here’s my analysis:
The action set is {give, don’t give}, which I’ll identify with {1, 0}.
Now, the possible deterministic policies are simply every mapping from {N,O} --> {1,0}, of which there are 4.
We can disregard the policies for which pi(N) = 1, since giving money to Nomega serves no purpose.
So we’re left with
pi_give
and
pi_don’t,
which give/don’t, respectively, to Omega.
Now, we can easily compute expected value, as follows:
r (pi_give(N)) = 0
r (pi_give(O, heads)) = 10
r (pi_give(0, tails)) = −1
r (pi_don’t(N)) = 10
r (pi_don’t(0)) = 0
So now:
Eg := E_give(r) = 0 p + .5 (10-1) * (1-p)
Ed := E_don’t(r) = 10 p + 0 (1-p)
Eg > Ed whenever 4.5 (1-p) > 10 p,
i.e. whenever 4.5 > 14.5 p
i.e. whenever 9⁄29 > p
So, whether you should precommit to being mugged depends on how likely you are to encounter N vs. O, which is intuitively obvious.
Thanks for pointing that out. The answer is, as expected, a function of p. So I now find explanations of why UDT gets mugged incomplete and misleading.
Here’s my analysis:
The action set is {give, don’t give}, which I’ll identify with {1, 0}. Now, the possible deterministic policies are simply every mapping from {N,O} --> {1,0}, of which there are 4.
We can disregard the policies for which pi(N) = 1, since giving money to Nomega serves no purpose. So we’re left with
pi_give
and
pi_don’t,
which give/don’t, respectively, to Omega.
Now, we can easily compute expected value, as follows:
r (pi_give(N)) = 0
r (pi_give(O, heads)) = 10
r (pi_give(0, tails)) = −1
r (pi_don’t(N)) = 10
r (pi_don’t(0)) = 0
So now:
Eg := E_give(r) = 0 p + .5 (10-1) * (1-p)
Ed := E_don’t(r) = 10 p + 0 (1-p)
Eg > Ed whenever 4.5 (1-p) > 10 p,
i.e. whenever 4.5 > 14.5 p
i.e. whenever 9⁄29 > p
So, whether you should precommit to being mugged depends on how likely you are to encounter N vs. O, which is intuitively obvious.