This doesn’t make sense to me. Why am I not allowed to update on still being in the game?
I noticed that in your problem setup you deliberately removed n=6 from being in the prior distribution. That feels like cheating to me—it seems like a perfectly valid hypothesis.
After seeing the first chamber come up empty, that should definitively update me away from n=6. Why can’t I update away from n=5 ?
Yes, the n=6 case is special. I didn’t mean to “cheat” but I simply excluded it because it’s trivial. But past the certainty that the game isn’t rigged that much, you can’t gain anything else. If you didn’t condition on the probability of observing the sequence, nothing would actually change anyway. Your probability distribution would be
P(n)∝(1−n6)N
(properly normalized, of course). This skews the distribution ever further towards low values of n, irrespective of any information about the actual gun. In other words, if you didn’t quit at the beginning, this will never make you quit—you will think you’re safer and safer by sheer virtue of playing longer, irrespective of whether you actually are. So, what use are you getting out of this information? None at all. If you are in a game that is worth playing, you gain zero; you would have played anyway. If you are not in a game that is worth playing, you lose in expectation the difference V−WPLAY. So either way, this information is worthless. The only information that is useful is one that behaves differently (again, in expectation) between a world in which the optimal strategy is to play, and one in which the optimal strategy is to quit, and allows you to make better decisions. But there is no such useful information you can gain in this game upstream of your decision.
Also please notice that in the second game, the one with the blanks, my criterion allows you to define a distribution of belief that actually you can get some use out of. But if we consistently applied your suggested criterion, and did not normalize over observable paths, then the belief after E empty chambers would just be
P(b;E)=(E+1)bE
which behaves exactly like the function above. It’s not really affected by your actual trajectory, it will simply convince you that playing is safer every time an empty chamber comes up, and can’t change your optimal strategy. Which means, again, you can’t get any usefulness out of it. This for an example of a game in which instead using the other approach can yield some gains.
This doesn’t make sense to me. Why am I not allowed to update on still being in the game?
I noticed that in your problem setup you deliberately removed n=6 from being in the prior distribution. That feels like cheating to me—it seems like a perfectly valid hypothesis.
After seeing the first chamber come up empty, that should definitively update me away from n=6. Why can’t I update away from n=5 ?
Yes, the n=6 case is special. I didn’t mean to “cheat” but I simply excluded it because it’s trivial. But past the certainty that the game isn’t rigged that much, you can’t gain anything else. If you didn’t condition on the probability of observing the sequence, nothing would actually change anyway. Your probability distribution would be
P(n)∝(1−n6)N(properly normalized, of course). This skews the distribution ever further towards low values of n, irrespective of any information about the actual gun. In other words, if you didn’t quit at the beginning, this will never make you quit—you will think you’re safer and safer by sheer virtue of playing longer, irrespective of whether you actually are. So, what use are you getting out of this information? None at all. If you are in a game that is worth playing, you gain zero; you would have played anyway. If you are not in a game that is worth playing, you lose in expectation the difference V−WPLAY. So either way, this information is worthless. The only information that is useful is one that behaves differently (again, in expectation) between a world in which the optimal strategy is to play, and one in which the optimal strategy is to quit, and allows you to make better decisions. But there is no such useful information you can gain in this game upstream of your decision.
Also please notice that in the second game, the one with the blanks, my criterion allows you to define a distribution of belief that actually you can get some use out of. But if we consistently applied your suggested criterion, and did not normalize over observable paths, then the belief after E empty chambers would just be
P(b;E)=(E+1)bEwhich behaves exactly like the function above. It’s not really affected by your actual trajectory, it will simply convince you that playing is safer every time an empty chamber comes up, and can’t change your optimal strategy. Which means, again, you can’t get any usefulness out of it. This for an example of a game in which instead using the other approach can yield some gains.