Great post by the way. Thank you. It sounds like your job is to think about this sort of thing!
I think I now believe that the answer to the original question can’t be £125, unless you already know what happens next.
Suppose the question is something like: “Every time you give me a penny, I’ll give you the next number. At any time you can stop and make your one guess.” It seems to me that there has to be a computer program that is best at playing this game. Do you have any idea what its stopping criterion would be? Or what the price would have to be for it to refuse to take any numbers at all?
It strikes me that this is actually a very dodgy problem indeed, and that if someone asks you these sorts of questions you should be very careful.
On the other hand it also strikes me that even in the absence of information about future offers, you should be prepared to pay something for the first number. You do, after all, expect to be £125 better off as a result of knowing it!
I have a queasy feeling of paradox and I notice that I am confused.
I put some time into solving this problem, and have reached a point where the amount of algebra necessary to continue is beyond what I’m willing to do. (The problem is that the transition probabilities are piecewise functions of the odds, and that makes everything unfun.) I have thought of an analogous problem that’s mathematically simpler (basically, it’ll be the unfair coin, and the reward will be based on guessing the degree of unfairness, not which of two it is) that I’ll write up a longer explanation of how to do sometime over the weekend.
I’ll look forward to it. Don’t put time into this unless you’re enjoying it. I haven’t seen Oswald in ages, and my current commitment is a mental note to either think about the biased coin version or write some computer simulations next time I’m bored.
So, not quite an explanation, more of an exercise:
Oswald brings his laptop to a bar, loads up Matlab, and types:
p=rand(); c=0;
p is now a double between 0 and 1, which we can treat as continuously and uniformly distributed across that range. c is the number of times you’ve gotten a hint.
Now, Oswald types in another line:
[c+=1, rand()<p]
This will both increase the number of hints you’ve received, and give you a 0 or 1, if a new, uniformly selected random number is smaller than the first random number. (Basically, this is flipping a biased coin which gives ‘heads’ with probability p and tails with probability 1-p. You can repeat this line as many times as you like.
Now, this bar is called The Improper Prior, and as such is filled with Bayesians. It’s readily obvious to the patrons that their posterior on p should be a beta distribution, with α equal to one plus the number of 1s and β equal to one plus the number of 0s.
But now is when things get interesting: your chance of guessing p exactly is basically zero. So Oswald might instead reward you for guessing within .05 of the actual p. More guesses should be penalized- either by decreasing the acceptable range or by decreasing the reward for guessing correctly. Alternatively, Oswald might reward you based on the precision of your posterior, or some other function.
Unfortunately, the beta distribution’s cdf is not pleasant to play with. Matlab can deal with it easily- just type:
betainc(x,a,b)
We could determine the chance that your guess is within .05 of the correct by typing:
betainc(x+.05,a,b)-betainc(x-.05,a,b)
Unfortunately (again!), this isn’t maximized by centering your estimate at the mean, unless a=b. You can test this with a=3, b=2; we have:
And so if Oswald uses this reward system, we’ll have to solve an optimization problem to determine what our guess is at each stage, which isn’t going to be fun. (The dumb way to do it throws
betainc(x+.05,a,b)-betainc(x-.05,a,b);
into some nonlinear optimization algorithm which shifts around x until it finds a local maximum, starting with a/(a+b) as the guess. What’s the smart way to do it?)
Oswald might also be reluctant to reward us based on precision, because that can grow enormously high as α and β increase. So instead let’s suppose he offers a flat reward, minus some constant times the variance minus some constant times the number of guesses we made, and he wants to know how to price entry into the game, so he can set the expected profit where he wants it to be.
Now we’re in an interesting situation, because the variance can increase or decrease based on what we’ve seen. If you get two heads in a row, the variance is .06; a tails will increase it to .077, and a third heads will decrease it to .039. On average, you expect the variance after you see another coin to be .048. On average, the variance should always decrease after we get another hint. We also know that the amount each hint is expected to lower our variance will be a decreasing function of α and β for large enough values. (Really? Why would you believe those two statements?)
We can now easily calculate the actual variance and the expected variance after another hint for any (α,β) pair. If the costs are fixed we can determine when it wouldn’t be worthwhile to buy one more. If α and β and large enough, that’ll be enough for us to stop because we know future hints will be less valuable than the current hint and the current hint is a bad idea.
We can then propagate backwards from the terminal states to determine the total value of playing the game optimally. We also can be certain this game valuation procedure will terminate in reasonable time for reasonable choices of the penalty parameters. (Again, why?)
Great post by the way. Thank you. It sounds like your job is to think about this sort of thing!
I think I now believe that the answer to the original question can’t be £125, unless you already know what happens next.
Suppose the question is something like: “Every time you give me a penny, I’ll give you the next number. At any time you can stop and make your one guess.” It seems to me that there has to be a computer program that is best at playing this game. Do you have any idea what its stopping criterion would be? Or what the price would have to be for it to refuse to take any numbers at all?
It strikes me that this is actually a very dodgy problem indeed, and that if someone asks you these sorts of questions you should be very careful.
On the other hand it also strikes me that even in the absence of information about future offers, you should be prepared to pay something for the first number. You do, after all, expect to be £125 better off as a result of knowing it!
I have a queasy feeling of paradox and I notice that I am confused.
I put some time into solving this problem, and have reached a point where the amount of algebra necessary to continue is beyond what I’m willing to do. (The problem is that the transition probabilities are piecewise functions of the odds, and that makes everything unfun.) I have thought of an analogous problem that’s mathematically simpler (basically, it’ll be the unfair coin, and the reward will be based on guessing the degree of unfairness, not which of two it is) that I’ll write up a longer explanation of how to do sometime over the weekend.
I’ll look forward to it. Don’t put time into this unless you’re enjoying it. I haven’t seen Oswald in ages, and my current commitment is a mental note to either think about the biased coin version or write some computer simulations next time I’m bored.
So, not quite an explanation, more of an exercise:
Oswald brings his laptop to a bar, loads up Matlab, and types:
p is now a double between 0 and 1, which we can treat as continuously and uniformly distributed across that range. c is the number of times you’ve gotten a hint.
Now, Oswald types in another line:
This will both increase the number of hints you’ve received, and give you a 0 or 1, if a new, uniformly selected random number is smaller than the first random number. (Basically, this is flipping a biased coin which gives ‘heads’ with probability p and tails with probability 1-p. You can repeat this line as many times as you like.
Now, this bar is called The Improper Prior, and as such is filled with Bayesians. It’s readily obvious to the patrons that their posterior on p should be a beta distribution, with α equal to one plus the number of 1s and β equal to one plus the number of 0s.
But now is when things get interesting: your chance of guessing p exactly is basically zero. So Oswald might instead reward you for guessing within .05 of the actual p. More guesses should be penalized- either by decreasing the acceptable range or by decreasing the reward for guessing correctly. Alternatively, Oswald might reward you based on the precision of your posterior, or some other function.
Unfortunately, the beta distribution’s cdf is not pleasant to play with. Matlab can deal with it easily- just type:
We could determine the chance that your guess is within .05 of the correct by typing:
Unfortunately (again!), this isn’t maximized by centering your estimate at the mean, unless a=b. You can test this with a=3, b=2; we have:
And so if Oswald uses this reward system, we’ll have to solve an optimization problem to determine what our guess is at each stage, which isn’t going to be fun. (The dumb way to do it throws
into some nonlinear optimization algorithm which shifts around x until it finds a local maximum, starting with a/(a+b) as the guess. What’s the smart way to do it?)
Oswald might also be reluctant to reward us based on precision, because that can grow enormously high as α and β increase. So instead let’s suppose he offers a flat reward, minus some constant times the variance minus some constant times the number of guesses we made, and he wants to know how to price entry into the game, so he can set the expected profit where he wants it to be.
Now we’re in an interesting situation, because the variance can increase or decrease based on what we’ve seen. If you get two heads in a row, the variance is .06; a tails will increase it to .077, and a third heads will decrease it to .039. On average, you expect the variance after you see another coin to be .048. On average, the variance should always decrease after we get another hint. We also know that the amount each hint is expected to lower our variance will be a decreasing function of α and β for large enough values. (Really? Why would you believe those two statements?)
We can now easily calculate the actual variance and the expected variance after another hint for any (α,β) pair. If the costs are fixed we can determine when it wouldn’t be worthwhile to buy one more. If α and β and large enough, that’ll be enough for us to stop because we know future hints will be less valuable than the current hint and the current hint is a bad idea.
We can then propagate backwards from the terminal states to determine the total value of playing the game optimally. We also can be certain this game valuation procedure will terminate in reasonable time for reasonable choices of the penalty parameters. (Again, why?)