A magician asks you to choose any infinite sequence of 0s and 1s, and to start reciting it until they say stop. They will then make a prediction of the form “p% of the next n digits will be 0”, and they will be correct to within 1% at least 99% of the times they perform the trick. How is the trick done?
We’ll describe a strategy for the magician that stops you sometime within the first 2k bits and gets an expected squared error of exactly 1/k regardless of your string. If we pick k=106, the strategy has a squared error of at most 10−6, and so gets within 1% at least 99% of the time. This requires the magician to wait for up to 2106 bits. I suspect that’s roughly optimal but haven’t checked the construction carefully.
The strategy is to pick a random 0≤ℓ≤k, then pick a random 1≤m≤2k−l−1. Then the magician stops you after (2m+1)2ℓ steps, and guesses that the next 2ℓ bits will have the same average as the preceding 2ℓ bits.
(Probably it would also work to pick a random multiple of 2ℓ instead of an odd multiple, and it would probably work to just pick the horizon and stopping point at random from suitable distributions. The exact choices here were just to make the analysis work out, and it may be an artifact of me being more comfortable with boolean fourier analysis than real fourier analysis.)
The intuition is that you can try to trick the magician in many ways, but you can’t do all of them at once:
If the magician guesses that the next bit will be equal to the last, then to fool you could make each bit different from the last.
But if your bitstring were to flip a lot of the time, then it would average to 1⁄2 over any long time period.
So to stymie that, you’d need to ensure that you also include some “low frequency” variation where the most common symbol flips from 0 to 1 every 1000 steps (say).
But then the magician could zoom out to every million steps, and the average would be closer to 50-50. So we need even-lower-frequency variation.
But we only have so much variance to go around. So eventually the magician is guaranteed to be OK—if they choose from at least a million exponentially decreasing frequencies, and then guess that their can’t be much oscillation at that frequency, they will be roughly correct.
OK, now the actual analysis. Let’s consider your first 2k numbers as a function f:{0,1}k→{−1,1} (note that we’ve replaced 0 with −1 just to make things balanced, I think that’s just personal preference). Define ^f as the fourier transform of f, i.e. ^f(y)=12n∑x(−1)x⋅yf(x). Then it’s easy to see that f(x)=∑y(−1)x⋅y^f(y).
The fact that “there is only so much variance to go around” is captured by the following calculation: ∑y^f(y)2=∑y(12k∑x(−1)x⋅yf(x))2=122k∑y∑x∑x′(−1)(x+x′)⋅yf(x)f(x′). Then we can flip the order of the sum over x and y, and note that ∑y(−1)(x+x′)⋅y=0 unless x=x′. Thus ∑y^f(y)2=12k∑xf(x)2=1.
If we write down the magician’s loss in terms of the fourier coefficients, we’ll get just the bounds we wanted. This is a lot of symbols but it’s all pretty mechanical and standard.
If the magician picks m and ℓ, then their guess will be the average value of f over all bitstrings that start with m0. Then that guess will be compared to the average of f over all bitstrings that start with m1. I’m going to ignore factors of 2 in the error for simplicity. Then their squared error is: (12ℓ∑z∈{0,1}ℓf(m0z)−12ℓ∑z∈{0,1}ℓf(m1z))2. We can expand each appearance of f in terms of the fourier transform ^f, and group together like terms, to get: (12ℓ∑z∈{0,1}ℓ∑y^f(y)((−1)m0z⋅y−(−1)m1z⋅y))2.
Now we can flip the order of summation of y and z. When we sum over z, the innermost term (−1)m0z⋅y−(−1)m1z⋅y usually disappears. The two terms are opposite unless y has a 1 in the (ℓ+1)st coordinate. And the summation over z will cancel everything out if y has any non-zero terms in positions after ℓ+1. So this is non-zero if and only if the final 1 in y is in position ℓ+1. For such y, it is equal to 2ℓ(−1)m0k−ℓ⋅y .
Write Yℓ for the set of y which have their last 1 in position ℓ+1, and write m⋅y for m0k−ℓ⋅y. Then we can write the squared loss as ∑y∈Yℓ∑y′∈Yℓ^f(y)^f(y′)(−1)m⋅(y+y′). If we now take the average over the 2k−ℓ different values of m, we get 12k−ℓ∑m∑y∈Yℓ∑y′∈Yℓ^f(y)^f(y′)(−1)m⋅(y+y′). Again, we can reverse the order of summation, and the inner sum adds up to 0 unless y=y′. So we get ∑y∈Yt^f(y)2.
Now if we draw ℓ at random, we get an expected squared error of 1k∑ℓ∑y∈Yℓ^f(y)2=1k∑y^f(y)2=1k. QED.
(I sorted of cheated in that I promised exactly 1/k error but then dropped some factors of 2. I think those factors exactly cancel with the difference between predicting ±1 and predicting {0,1}, but don’t hold me to that.)
I’ll be thinking about this problem for the next hour or so unless something comes up, so I’ll be writing my thoughts down here if I think they are important.
A couple of preliminary observations:
The magician must be playing a mixed strategy. This is easy to see since otherwise a copy of the magician playing against him could know when he stops and could know his prediction and deliberately falsify it no matter what.
It’s not enough for the prediction after the stopping to be random, the stopping time itself must also be random. This is because if I knew when the magician would stop then from that moment on I would just randomize my bit string to be either all 1s or all 0s, and the magician couldn’t win with a probability above 50% no matter what.
The naive intuition that “it’s impossible because the magician can’t gain information about the future of an arbitrary bit string” is wrong, because the magician doesn’t actually need to gain information in a technical sense to be able to make this prediction. Obviously if the bit strings are sampled randomly from the Cantor set then the magician could just guess that 50% of the next 1 million digits would be 0 and he would win even though he has no information about the string at all.
I don’t have a solution, but I’ve done some analysis to try to make a strategy of the following form work: pick the number of steps at which you halt from a distribution ∼1/k1+ε and then guess that the mean of the next k terms will be equal to the mean of the last k terms. I tried to get a cancellation in some L2 error sum but unfortunately it went nowhere.
I think this strategy doesn’t work and I’m not even sure if it’s a good idea to not have k depend on the sequence the magician has seen so far, but I couldn’t see a simple argument for why it would be impossible so I tried to make it work anyway.
The magician needs to separately pick “how long a horizon to look over” and “when to stop.” Otherwise I can just have a string with exponentially increasing runs of 0s and 1s, and you’ll usually be pretty wrong if you use the first k to predict the next k.
Is the following a correct reformulation of your problem?
Say that a magician’s pure strategy is a function on finite bitstrings which returns either “go on” or “stop” with a positive integer n and a frequency f of zeros.
A magician’s mixed strategy is a probability distribution over magician’s pure strategies. (Btw, I’m not sure what kind of sigma-algebra is suitable here.)
A player’s mixed strategy is a probability distribution over infinite bitstrings.
A magician’s mixed strategy together with a player’s mixed strategy define, in the obvious way, a probability distribution over outcomes “magician wins” (i.e., the magician’s prediction was within 1%) and “player wins”.
You’re claiming that there is a magician’s mixed strategy s.t. for every player’s mixed strategy, magician wins with probability at least 0.99.
I think you can slightly simplify to say: the magician has a probability distribution over strategies, such that for every sequence the magician wins with probability 0.99.
A magician asks you to choose any infinite sequence of 0s and 1s, and to start reciting it until they say stop. They will then make a prediction of the form “p% of the next n digits will be 0”, and they will be correct to within 1% at least 99% of the times they perform the trick. How is the trick done?
h/t Alex Arkhipov
A solution:
We’ll describe a strategy for the magician that stops you sometime within the first 2k bits and gets an expected squared error of exactly 1/k regardless of your string. If we pick k=106, the strategy has a squared error of at most 10−6, and so gets within 1% at least 99% of the time. This requires the magician to wait for up to 2106 bits. I suspect that’s roughly optimal but haven’t checked the construction carefully.
The strategy is to pick a random 0≤ℓ≤k, then pick a random 1≤m≤2k−l−1. Then the magician stops you after (2m+1)2ℓ steps, and guesses that the next 2ℓ bits will have the same average as the preceding 2ℓ bits.
(Probably it would also work to pick a random multiple of 2ℓ instead of an odd multiple, and it would probably work to just pick the horizon and stopping point at random from suitable distributions. The exact choices here were just to make the analysis work out, and it may be an artifact of me being more comfortable with boolean fourier analysis than real fourier analysis.)
The intuition is that you can try to trick the magician in many ways, but you can’t do all of them at once:
If the magician guesses that the next bit will be equal to the last, then to fool you could make each bit different from the last.
But if your bitstring were to flip a lot of the time, then it would average to 1⁄2 over any long time period.
So to stymie that, you’d need to ensure that you also include some “low frequency” variation where the most common symbol flips from 0 to 1 every 1000 steps (say).
But then the magician could zoom out to every million steps, and the average would be closer to 50-50. So we need even-lower-frequency variation.
But we only have so much variance to go around. So eventually the magician is guaranteed to be OK—if they choose from at least a million exponentially decreasing frequencies, and then guess that their can’t be much oscillation at that frequency, they will be roughly correct.
OK, now the actual analysis. Let’s consider your first 2k numbers as a function f:{0,1}k→{−1,1} (note that we’ve replaced 0 with −1 just to make things balanced, I think that’s just personal preference). Define ^f as the fourier transform of f, i.e. ^f(y)=12n∑x(−1)x⋅yf(x). Then it’s easy to see that f(x)=∑y(−1)x⋅y^f(y).
The fact that “there is only so much variance to go around” is captured by the following calculation: ∑y^f(y)2=∑y(12k∑x(−1)x⋅yf(x))2=122k∑y∑x∑x′(−1)(x+x′)⋅yf(x)f(x′). Then we can flip the order of the sum over x and y, and note that ∑y(−1)(x+x′)⋅y=0 unless x=x′. Thus ∑y^f(y)2=12k∑xf(x)2=1.
If we write down the magician’s loss in terms of the fourier coefficients, we’ll get just the bounds we wanted. This is a lot of symbols but it’s all pretty mechanical and standard.
If the magician picks m and ℓ, then their guess will be the average value of f over all bitstrings that start with m0. Then that guess will be compared to the average of f over all bitstrings that start with m1. I’m going to ignore factors of 2 in the error for simplicity. Then their squared error is: (12ℓ∑z∈{0,1}ℓf(m0z)−12ℓ∑z∈{0,1}ℓf(m1z))2. We can expand each appearance of f in terms of the fourier transform ^f, and group together like terms, to get: (12ℓ∑z∈{0,1}ℓ∑y^f(y)((−1)m0z⋅y−(−1)m1z⋅y))2.
Now we can flip the order of summation of y and z. When we sum over z, the innermost term (−1)m0z⋅y−(−1)m1z⋅y usually disappears. The two terms are opposite unless y has a 1 in the (ℓ+1)st coordinate. And the summation over z will cancel everything out if y has any non-zero terms in positions after ℓ+1. So this is non-zero if and only if the final 1 in y is in position ℓ+1. For such y, it is equal to 2ℓ(−1)m0k−ℓ⋅y .
Write Yℓ for the set of y which have their last 1 in position ℓ+1, and write m⋅y for m0k−ℓ⋅y. Then we can write the squared loss as ∑y∈Yℓ∑y′∈Yℓ^f(y)^f(y′)(−1)m⋅(y+y′). If we now take the average over the 2k−ℓ different values of m, we get 12k−ℓ∑m∑y∈Yℓ∑y′∈Yℓ^f(y)^f(y′)(−1)m⋅(y+y′). Again, we can reverse the order of summation, and the inner sum adds up to 0 unless y=y′. So we get ∑y∈Yt^f(y)2.
Now if we draw ℓ at random, we get an expected squared error of 1k∑ℓ∑y∈Yℓ^f(y)2=1k∑y^f(y)2=1k. QED.
(I sorted of cheated in that I promised exactly 1/k error but then dropped some factors of 2. I think those factors exactly cancel with the difference between predicting ±1 and predicting {0,1}, but don’t hold me to that.)
I found this puzzle fun and I feel like I learned something useful about math when you gave it to me a few years ago.
Curious what the solution to this one is? Couldn’t figure it out
I posted it in another reply.
I’ll be thinking about this problem for the next hour or so unless something comes up, so I’ll be writing my thoughts down here if I think they are important.
A couple of preliminary observations:
The magician must be playing a mixed strategy. This is easy to see since otherwise a copy of the magician playing against him could know when he stops and could know his prediction and deliberately falsify it no matter what.
It’s not enough for the prediction after the stopping to be random, the stopping time itself must also be random. This is because if I knew when the magician would stop then from that moment on I would just randomize my bit string to be either all 1s or all 0s, and the magician couldn’t win with a probability above 50% no matter what.
The naive intuition that “it’s impossible because the magician can’t gain information about the future of an arbitrary bit string” is wrong, because the magician doesn’t actually need to gain information in a technical sense to be able to make this prediction. Obviously if the bit strings are sampled randomly from the Cantor set then the magician could just guess that 50% of the next 1 million digits would be 0 and he would win even though he has no information about the string at all.
I don’t have a solution, but I’ve done some analysis to try to make a strategy of the following form work: pick the number of steps at which you halt from a distribution ∼1/k1+ε and then guess that the mean of the next k terms will be equal to the mean of the last k terms. I tried to get a cancellation in some L2 error sum but unfortunately it went nowhere.
I think this strategy doesn’t work and I’m not even sure if it’s a good idea to not have k depend on the sequence the magician has seen so far, but I couldn’t see a simple argument for why it would be impossible so I tried to make it work anyway.
This seems like it nearly works, but
The magician needs to separately pick “how long a horizon to look over” and “when to stop.” Otherwise I can just have a string with exponentially increasing runs of 0s and 1s, and you’ll usually be pretty wrong if you use the first k to predict the next k.
Is the following a correct reformulation of your problem?
Say that a magician’s pure strategy is a function on finite bitstrings which returns either “go on” or “stop” with a positive integer n and a frequency f of zeros.
A magician’s mixed strategy is a probability distribution over magician’s pure strategies. (Btw, I’m not sure what kind of sigma-algebra is suitable here.)
A player’s mixed strategy is a probability distribution over infinite bitstrings.
A magician’s mixed strategy together with a player’s mixed strategy define, in the obvious way, a probability distribution over outcomes “magician wins” (i.e., the magician’s prediction was within 1%) and “player wins”.
You’re claiming that there is a magician’s mixed strategy s.t. for every player’s mixed strategy, magician wins with probability at least 0.99.
I think you can slightly simplify to say: the magician has a probability distribution over strategies, such that for every sequence the magician wins with probability 0.99.
Yes, indeed.