I’ll look forward to it. Don’t put time into this unless you’re enjoying it. I haven’t seen Oswald in ages, and my current commitment is a mental note to either think about the biased coin version or write some computer simulations next time I’m bored.
So, not quite an explanation, more of an exercise:
Oswald brings his laptop to a bar, loads up Matlab, and types:
p=rand(); c=0;
p is now a double between 0 and 1, which we can treat as continuously and uniformly distributed across that range. c is the number of times you’ve gotten a hint.
Now, Oswald types in another line:
[c+=1, rand()<p]
This will both increase the number of hints you’ve received, and give you a 0 or 1, if a new, uniformly selected random number is smaller than the first random number. (Basically, this is flipping a biased coin which gives ‘heads’ with probability p and tails with probability 1-p. You can repeat this line as many times as you like.
Now, this bar is called The Improper Prior, and as such is filled with Bayesians. It’s readily obvious to the patrons that their posterior on p should be a beta distribution, with α equal to one plus the number of 1s and β equal to one plus the number of 0s.
But now is when things get interesting: your chance of guessing p exactly is basically zero. So Oswald might instead reward you for guessing within .05 of the actual p. More guesses should be penalized- either by decreasing the acceptable range or by decreasing the reward for guessing correctly. Alternatively, Oswald might reward you based on the precision of your posterior, or some other function.
Unfortunately, the beta distribution’s cdf is not pleasant to play with. Matlab can deal with it easily- just type:
betainc(x,a,b)
We could determine the chance that your guess is within .05 of the correct by typing:
betainc(x+.05,a,b)-betainc(x-.05,a,b)
Unfortunately (again!), this isn’t maximized by centering your estimate at the mean, unless a=b. You can test this with a=3, b=2; we have:
And so if Oswald uses this reward system, we’ll have to solve an optimization problem to determine what our guess is at each stage, which isn’t going to be fun. (The dumb way to do it throws
betainc(x+.05,a,b)-betainc(x-.05,a,b);
into some nonlinear optimization algorithm which shifts around x until it finds a local maximum, starting with a/(a+b) as the guess. What’s the smart way to do it?)
Oswald might also be reluctant to reward us based on precision, because that can grow enormously high as α and β increase. So instead let’s suppose he offers a flat reward, minus some constant times the variance minus some constant times the number of guesses we made, and he wants to know how to price entry into the game, so he can set the expected profit where he wants it to be.
Now we’re in an interesting situation, because the variance can increase or decrease based on what we’ve seen. If you get two heads in a row, the variance is .06; a tails will increase it to .077, and a third heads will decrease it to .039. On average, you expect the variance after you see another coin to be .048. On average, the variance should always decrease after we get another hint. We also know that the amount each hint is expected to lower our variance will be a decreasing function of α and β for large enough values. (Really? Why would you believe those two statements?)
We can now easily calculate the actual variance and the expected variance after another hint for any (α,β) pair. If the costs are fixed we can determine when it wouldn’t be worthwhile to buy one more. If α and β and large enough, that’ll be enough for us to stop because we know future hints will be less valuable than the current hint and the current hint is a bad idea.
We can then propagate backwards from the terminal states to determine the total value of playing the game optimally. We also can be certain this game valuation procedure will terminate in reasonable time for reasonable choices of the penalty parameters. (Again, why?)
I’ll look forward to it. Don’t put time into this unless you’re enjoying it. I haven’t seen Oswald in ages, and my current commitment is a mental note to either think about the biased coin version or write some computer simulations next time I’m bored.
So, not quite an explanation, more of an exercise:
Oswald brings his laptop to a bar, loads up Matlab, and types:
p is now a double between 0 and 1, which we can treat as continuously and uniformly distributed across that range. c is the number of times you’ve gotten a hint.
Now, Oswald types in another line:
This will both increase the number of hints you’ve received, and give you a 0 or 1, if a new, uniformly selected random number is smaller than the first random number. (Basically, this is flipping a biased coin which gives ‘heads’ with probability p and tails with probability 1-p. You can repeat this line as many times as you like.
Now, this bar is called The Improper Prior, and as such is filled with Bayesians. It’s readily obvious to the patrons that their posterior on p should be a beta distribution, with α equal to one plus the number of 1s and β equal to one plus the number of 0s.
But now is when things get interesting: your chance of guessing p exactly is basically zero. So Oswald might instead reward you for guessing within .05 of the actual p. More guesses should be penalized- either by decreasing the acceptable range or by decreasing the reward for guessing correctly. Alternatively, Oswald might reward you based on the precision of your posterior, or some other function.
Unfortunately, the beta distribution’s cdf is not pleasant to play with. Matlab can deal with it easily- just type:
We could determine the chance that your guess is within .05 of the correct by typing:
Unfortunately (again!), this isn’t maximized by centering your estimate at the mean, unless a=b. You can test this with a=3, b=2; we have:
And so if Oswald uses this reward system, we’ll have to solve an optimization problem to determine what our guess is at each stage, which isn’t going to be fun. (The dumb way to do it throws
into some nonlinear optimization algorithm which shifts around x until it finds a local maximum, starting with a/(a+b) as the guess. What’s the smart way to do it?)
Oswald might also be reluctant to reward us based on precision, because that can grow enormously high as α and β increase. So instead let’s suppose he offers a flat reward, minus some constant times the variance minus some constant times the number of guesses we made, and he wants to know how to price entry into the game, so he can set the expected profit where he wants it to be.
Now we’re in an interesting situation, because the variance can increase or decrease based on what we’ve seen. If you get two heads in a row, the variance is .06; a tails will increase it to .077, and a third heads will decrease it to .039. On average, you expect the variance after you see another coin to be .048. On average, the variance should always decrease after we get another hint. We also know that the amount each hint is expected to lower our variance will be a decreasing function of α and β for large enough values. (Really? Why would you believe those two statements?)
We can now easily calculate the actual variance and the expected variance after another hint for any (α,β) pair. If the costs are fixed we can determine when it wouldn’t be worthwhile to buy one more. If α and β and large enough, that’ll be enough for us to stop because we know future hints will be less valuable than the current hint and the current hint is a bad idea.
We can then propagate backwards from the terminal states to determine the total value of playing the game optimally. We also can be certain this game valuation procedure will terminate in reasonable time for reasonable choices of the penalty parameters. (Again, why?)