Christopher King comments on Learning as you play: anthropic shadow in deadly games

Christopher King 6 Dec 2023 17:32 UTC
1 point
0
Maximizing expected utility in Chinese Roulette requires Bayesian updating.

Let’s say on priors that P(n=1) = p and that P(n=5) = 1-p. Call this instance of the game G_p.

Let’s say that you shoot instead of quit the first round. For G_1/2, there are four possibilities:
1. n = 1, vase destroyed: The probability of this scenario is ¹⁄₁₂. No further choices are needed.
2. n = 5, vase destroyed. The probability of this scenario is ⁵⁄₁₂. No further choices are needed.
3. n = 1, vase survived: The probability of this scenario is ⁵⁄₁₂. The player needs a strategy to continue playing.
4. n = 5, vase survived. The probability of this scenario is ¹⁄₁₂. The player needs a strategy to continue playing.
Notice that the strategy must be the same for 3 and 4 since the observations are the same. Call this strategy S.

The expected utility, which we seek to maximize, is:

E[U(shoot and then S)] = 0 + ⁵⁄₁₂ * (R + E[U(S) | n = 1]) + ¹⁄₁₂ * (R + E[U(S) | n = 5])

Most of our utility is determined by the n = 1 worlds.

Manipulating the equation we get:

E[U(shoot and then S)] = R/2 + ¹⁄₂ * (5/6 * E[U(S) | n = 1] + ¹⁄₆ * E[U(S) | n = 5])

But the expression ⁵⁄₆ * E[U(S) | n = 1] + ¹⁄₆ * E[U(S) | n = 5] is the expected utility if we were playing G_5/6. So the optimal S is the optimal strategy for G_5/6. This is the same as doing a Bayesian update (1:1 * 5:1 = 5:1 = ⁵⁄₆).