Hmm. This show is interesting, because it feels like the best approach is to allocate the money according to your probabilities: if you’re 70% sure it’s A and 10% sure for B, C, or D, you should put £700,000 on A, £100,000 on B, C, and D.
But the expected value for doing that is actually significantly lower than putting all of your money on your best guess. For the example I gave, you would expect to have £520,000 if you spread out according to your probabilities and £700,000 if you put everything on your best guess.
Suppose, like most people, you have a utility function that’s roughly log(wealth). Then how you should play depends on your initial wealth- if you have modest wealth (say, £10,000), then you should only slightly overplay on your surest guess (72%, continuing with the 70-10-10-10 state of knowledge). If you have significant wealth (say, £100,000), then you should put quite a bit more onto your surest guess (88%). Answering honestly only maximizes utility if you have 0 wealth (i.e. you assign infinite disutility to leaving with no money).
(Those calculations all done for a single question, not 8.)
What’s interesting is that studies which offered significant monetary incentives have seen less probability matching and some researchers have thought that it might simply be an artifact of bored undergrads who don’t care about getting the answer right, pattern recency effects (like the gambler’s fallacy since all the experiments use frequency over a sequence as their measure of probability). Here we have what looks like probability matching with monetary incentives 6-7 orders of magnitude greater than what is used in the lab and it is probability matching with subjective probability which would eliminate any gambler’s fallacy effect.
For zero initial wealth and log-wealth utility, answering “honestly” is optimal even for the many-round generalization. I wrote a script and realized this through experiments, but it is obvious in retrospect. A very nice fact anyway. I guess it can be turned into some parable about maximum log-likelihoods.
EDIT: Second paragraph about nonzero initial wealth retracted because I found a bug in my script. Zero initial wealth case unaffected.
EDIT 2: Wow, this is beautiful. Is this well-known? I just had two realizations. The first one was that my original analysis only covered the case when I get to hear all the questions at the beginning of the game. The second realization was that despite my sloppy original analysis, the statement is true even for the more realistic case when I only hear the next question when I answered the previous one. It’s always worth being honest.
Looking at some much more advanced related papers, I am now sure that it is well known. But I’d still love to see some reference, be it a paper or a textbook. Could you please help me with this?
Really? That’s cool. How about the slightly more general version that I stated down-thread? I hope at least that one would have been news to Bernoulli, entropy hadn’t been invented yet.
For zero initial wealth and log-wealth utility, answering “honestly” is optimal even for the many-round generalization. I wrote a script and realized this through experiments, but it is obvious in retrospect.
Yep. Notice that if you have external wealth, the number of questions is relevant to deciding how much to overweight your best guess.
Yes, I knew about the properness of the logarithmic scoring rule. But I think my pretty little result is not covered by your Wikipedia link. Here it is, for completeness: If I play the multi-turn version of the game, with log utility and zero initial wealth, then my best strategy is honesty in each turn, even if they tell me the questions one by one. (Actually, they can uncover the questions to me in any order and by any schedule, the statement still holds.) I think this is a nontrivial generalization. Nontrivial in the sense that the (completely trivial) proof crucially depends on the utility being logarithmic.
I think it results from the scale independence of log (and as soon as you add external wealth, the scale independence goes away). That makes it so you can treat every question separately (since only its scale is determined by the previous questions, and that doesn’t impact the maximization). It is a pretty result, but as I don’t think people often talk about this sort of problem I don’t know if I would call it well known or not.
It’s also cool to work back from the last question and see how conditional probabilities connect the full knowledge case and the discovery case.
And here is the prettiest part: The whole thing seems to work even if we don’t assume that the answers to the quiz questions are independent random variables. The expected utility is always equal to the entropy of the joint distribution of the variables, and the best strategy is always honesty in every turn. Note the statement about entropy, it’s a new motif. Before generalizing to non-independent variables it did not add too much value, but in the current version, it is a quite powerful statement.
For example, let the first question be a 50%-50% A-B, but let the second question depend on the first correct answer in the following way: if it was A, then the second question is a 50%-50% C-D, but if it was B, then the second question is a 100% C. First it seemed to me that we can win some here by not being honest in the first round, but actually we can’t. The entropy of the joint distribution is 1.5 bits, and the only way to achieve this much expected utility is by betting 50%-50% in the first round.
I don’t see how this can be true in conjunction with Vaniver’s post. In particular, suppose you start with 0 wealth, and get a question right by being honest. Now you have e.g. £10,000 in winnings.
Then wouldn’t the problem of answering the second question be isomorphic to the problem of answering the first question when you don’t start with zero wealth, which as we’ve seen involves being dishonest?
I have two hypotheses explaining this. One is that I don’t understand something about the game show rules. Another is that you’re combining winnings improperly: winning £X then £Y should give you log(X+Y) utility, not log(X) + log(Y) (and if you mistakenly do the latter, then I think honesty is the best option in all cases).
You start with zero wealth. The quiz show host “gives” you a million pounds to play with, but you can only take home the money left after the last round. The intermediary levels of imaginary wealth do not affect the calculation.
How interesting! I guess the reason we intuitively want to allocate according to probability is because of prospect theory. In this example the “obvious” zero point is the start of the game, and you can never get worse than that. So the piece of the value function that can arise is concave starting at zero, i.e. it looks qualitatively like the log-curve for someone with no prior wealth.
Maximize expected utility, not expected money. If utility is linear in money, you should put all money on best option. If utility is logarithm of money, you should distribute proportional to probability. In the show (or its version in Russia, in any case), you can only put money on 3 out of 4 options, which means that you should put money proportional to probabilities of the 3 best options (assuming utility is log-money).
Hmm. This show is interesting, because it feels like the best approach is to allocate the money according to your probabilities: if you’re 70% sure it’s A and 10% sure for B, C, or D, you should put £700,000 on A, £100,000 on B, C, and D.
But the expected value for doing that is actually significantly lower than putting all of your money on your best guess. For the example I gave, you would expect to have £520,000 if you spread out according to your probabilities and £700,000 if you put everything on your best guess.
Suppose, like most people, you have a utility function that’s roughly log(wealth). Then how you should play depends on your initial wealth- if you have modest wealth (say, £10,000), then you should only slightly overplay on your surest guess (72%, continuing with the 70-10-10-10 state of knowledge). If you have significant wealth (say, £100,000), then you should put quite a bit more onto your surest guess (88%). Answering honestly only maximizes utility if you have 0 wealth (i.e. you assign infinite disutility to leaving with no money).
(Those calculations all done for a single question, not 8.)
Probability matching
What’s interesting is that studies which offered significant monetary incentives have seen less probability matching and some researchers have thought that it might simply be an artifact of bored undergrads who don’t care about getting the answer right, pattern recency effects (like the gambler’s fallacy since all the experiments use frequency over a sequence as their measure of probability). Here we have what looks like probability matching with monetary incentives 6-7 orders of magnitude greater than what is used in the lab and it is probability matching with subjective probability which would eliminate any gambler’s fallacy effect.
For zero initial wealth and log-wealth utility, answering “honestly” is optimal even for the many-round generalization. I wrote a script and realized this through experiments, but it is obvious in retrospect. A very nice fact anyway. I guess it can be turned into some parable about maximum log-likelihoods.
EDIT: Second paragraph about nonzero initial wealth retracted because I found a bug in my script. Zero initial wealth case unaffected.
EDIT 2: Wow, this is beautiful. Is this well-known? I just had two realizations. The first one was that my original analysis only covered the case when I get to hear all the questions at the beginning of the game. The second realization was that despite my sloppy original analysis, the statement is true even for the more realistic case when I only hear the next question when I answered the previous one. It’s always worth being honest.
Yes, it is well known. Log utility is used not because it is particularly realistic, but because it makes this calculation easy.
Looking at some much more advanced related papers, I am now sure that it is well known. But I’d still love to see some reference, be it a paper or a textbook. Could you please help me with this?
Maybe Bernoulli’s 1738 paper on the St Petersburg paradox, where he suggested that utility should be the log of wealth.
Really? That’s cool. How about the slightly more general version that I stated down-thread? I hope at least that one would have been news to Bernoulli, entropy hadn’t been invented yet.
Yep. Notice that if you have external wealth, the number of questions is relevant to deciding how much to overweight your best guess.
Yes.
Yes, I knew about the properness of the logarithmic scoring rule. But I think my pretty little result is not covered by your Wikipedia link. Here it is, for completeness: If I play the multi-turn version of the game, with log utility and zero initial wealth, then my best strategy is honesty in each turn, even if they tell me the questions one by one. (Actually, they can uncover the questions to me in any order and by any schedule, the statement still holds.) I think this is a nontrivial generalization. Nontrivial in the sense that the (completely trivial) proof crucially depends on the utility being logarithmic.
I think it results from the scale independence of log (and as soon as you add external wealth, the scale independence goes away). That makes it so you can treat every question separately (since only its scale is determined by the previous questions, and that doesn’t impact the maximization). It is a pretty result, but as I don’t think people often talk about this sort of problem I don’t know if I would call it well known or not.
It’s also cool to work back from the last question and see how conditional probabilities connect the full knowledge case and the discovery case.
And here is the prettiest part: The whole thing seems to work even if we don’t assume that the answers to the quiz questions are independent random variables. The expected utility is always equal to the entropy of the joint distribution of the variables, and the best strategy is always honesty in every turn. Note the statement about entropy, it’s a new motif. Before generalizing to non-independent variables it did not add too much value, but in the current version, it is a quite powerful statement.
For example, let the first question be a 50%-50% A-B, but let the second question depend on the first correct answer in the following way: if it was A, then the second question is a 50%-50% C-D, but if it was B, then the second question is a 100% C. First it seemed to me that we can win some here by not being honest in the first round, but actually we can’t. The entropy of the joint distribution is 1.5 bits, and the only way to achieve this much expected utility is by betting 50%-50% in the first round.
I don’t see how this can be true in conjunction with Vaniver’s post. In particular, suppose you start with 0 wealth, and get a question right by being honest. Now you have e.g. £10,000 in winnings.
Then wouldn’t the problem of answering the second question be isomorphic to the problem of answering the first question when you don’t start with zero wealth, which as we’ve seen involves being dishonest?
I have two hypotheses explaining this. One is that I don’t understand something about the game show rules. Another is that you’re combining winnings improperly: winning £X then £Y should give you log(X+Y) utility, not log(X) + log(Y) (and if you mistakenly do the latter, then I think honesty is the best option in all cases).
You start with zero wealth. The quiz show host “gives” you a million pounds to play with, but you can only take home the money left after the last round. The intermediary levels of imaginary wealth do not affect the calculation.
Oh, I see, I was completely misunderstanding the problem.
Lawful Uncertainty
How interesting! I guess the reason we intuitively want to allocate according to probability is because of prospect theory. In this example the “obvious” zero point is the start of the game, and you can never get worse than that. So the piece of the value function that can arise is concave starting at zero, i.e. it looks qualitatively like the log-curve for someone with no prior wealth.
Maximize expected utility, not expected money. If utility is linear in money, you should put all money on best option. If utility is logarithm of money, you should distribute proportional to probability. In the show (or its version in Russia, in any case), you can only put money on 3 out of 4 options, which means that you should put money proportional to probabilities of the 3 best options (assuming utility is log-money).