Well, Thrun’s algorithm does better than 50% for every distribution. But no matter what f(x) we choose, there will always be distributions that make the chance arbitrarily close to 50% (say, less than 50%+epsilon). To see why, note that for a given f(x) we can construct a distribution far enough from zero that all values of f(x) are less than epsilon, so the chance of switching prescribed by the algorithm is also less than epsilon.
The next question is whether we can find any other randomized algorithm that does better than 50%+epsilon on any distribution. The answer to that is also no.
1) Note that any randomized algorithm must decide whether to switch or not, based on the contents of the envelope and possibly a random number generator. In other words, it must be described by a function f(x) like in Thrun’s algorithm. f(x) doesn’t have to be monotonic, but must lie between 0 and 1 inclusive for every x.
2) For every such function f(x), we will construct a distribution of envelopes that makes it do worse than 50%+epsilon.
3) Let’s consider for each number x the degenerate distribution D_x that always puts x dollars in one envelope and 2*x in the other.
4) To make the algorithm do worse than 50%+epsilon on distribution D_x, we need the chance of switching at 2*x to be not much lower than the chance of switching at x. Namely, we need the condition f(2*x)>f(x)-epsilon.
5) Now we only need to prove that there exists an x such that f(2*x)>f(x)-epsilon. We will prove that by reductio ad absurdum. If we had f(2*x)≤f(x)-epsilon for every x, we could iterate that and obtain f(x)≥f(x*2^n)+n*epsilon for every x and n, which would make f(x) greater than 1. Contradiction, QED.
Yes, that all looks sensible. The point I’m trying to get at—the one I think Eliezer was gesturing towards—was that for any f and any epsilon, f(x) - f(2x) < epsilon for almost all x, in the formal sense. The next step is less straightforward—does it then follow that, prior to the selection of x, our expectation for getting the right answer is 50%? This seems to be Eliezer’s implication. However, it seems also to rest on an infinite uniform random distribution, which I understand can be problematic. Or have I misunderstood?
He didn’t say that, he said the benefit gets closer and closer to zero if you modify the setup in a certain way. I couldn’t find an interpretation that makes his statement correct, but at least it’s meaningful.
the algorithm did make use of prior knowledge about the envelope distribution. (As the density of the differential of the monotonic function, in the vicinity of the actual envelope contents, goes to zero, the expected benefit of the algorithm over random chance, goes to zero.)
without meaning that the expected density of the differential does go to zero—or perhaps would go to zero barring some particular prior knowledge about the envelope distribution. And that doesn’t sound like “modifying the setup” to me, that seems like it would make the statement irrelevant. What exactly is the “modification”, and what did you decide his statement really means, if you don’t mind?
Sorry, are you familiar with the mathematical concept of limit? Saying that “f(x) goes to zero as x goes to zero” does not imply the nonsensical belief that “x goes to zero”.
Yes, I am familiar with limits. What I mean is—if you say “f(x) goes to zero as x goes to zero”, then you are implying (in a non-mathematical sense) that we are evaluating f(x) in a region about zero—that is, we are interested in the behavior of f(x) close to x=0.
Edit: More to the point, if I say “g(f(x)) goes to zero as f(x) goes to infinity”, then f(x) better not be (known to be) bounded above.
Well, Thrun’s algorithm does better than 50% for every distribution. But no matter what f(x) we choose, there will always be distributions that make the chance arbitrarily close to 50% (say, less than 50%+epsilon). To see why, note that for a given f(x) we can construct a distribution far enough from zero that all values of f(x) are less than epsilon, so the chance of switching prescribed by the algorithm is also less than epsilon.
The next question is whether we can find any other randomized algorithm that does better than 50%+epsilon on any distribution. The answer to that is also no.
1) Note that any randomized algorithm must decide whether to switch or not, based on the contents of the envelope and possibly a random number generator. In other words, it must be described by a function f(x) like in Thrun’s algorithm. f(x) doesn’t have to be monotonic, but must lie between 0 and 1 inclusive for every x.
2) For every such function f(x), we will construct a distribution of envelopes that makes it do worse than 50%+epsilon.
3) Let’s consider for each number x the degenerate distribution D_x that always puts x dollars in one envelope and 2*x in the other.
4) To make the algorithm do worse than 50%+epsilon on distribution D_x, we need the chance of switching at 2*x to be not much lower than the chance of switching at x. Namely, we need the condition f(2*x)>f(x)-epsilon.
5) Now we only need to prove that there exists an x such that f(2*x)>f(x)-epsilon. We will prove that by reductio ad absurdum. If we had f(2*x)≤f(x)-epsilon for every x, we could iterate that and obtain f(x)≥f(x*2^n)+n*epsilon for every x and n, which would make f(x) greater than 1. Contradiction, QED.
Yes, that all looks sensible. The point I’m trying to get at—the one I think Eliezer was gesturing towards—was that for any f and any epsilon, f(x) - f(2x) < epsilon for almost all x, in the formal sense. The next step is less straightforward—does it then follow that, prior to the selection of x, our expectation for getting the right answer is 50%? This seems to be Eliezer’s implication. However, it seems also to rest on an infinite uniform random distribution, which I understand can be problematic. Or have I misunderstood?
That’s called an improper prior. Eliezer mentions in the post that it was his first idea, but turned out to be irrelevant to the analysis.
So I guess we’re back to square one, then.
I don’t understand. Which part are you still confused about? To me the whole thing seems quite clear.
How did Eliezer determine that the expected benefit of the algorithm over random chance is zero?
He didn’t say that, he said the benefit gets closer and closer to zero if you modify the setup in a certain way. I couldn’t find an interpretation that makes his statement correct, but at least it’s meaningful.
I don’t get why it makes sense to say
without meaning that the expected density of the differential does go to zero—or perhaps would go to zero barring some particular prior knowledge about the envelope distribution. And that doesn’t sound like “modifying the setup” to me, that seems like it would make the statement irrelevant. What exactly is the “modification”, and what did you decide his statement really means, if you don’t mind?
Sorry, are you familiar with the mathematical concept of limit? Saying that “f(x) goes to zero as x goes to zero” does not imply the nonsensical belief that “x goes to zero”.
Yes, I am familiar with limits. What I mean is—if you say “f(x) goes to zero as x goes to zero”, then you are implying (in a non-mathematical sense) that we are evaluating f(x) in a region about zero—that is, we are interested in the behavior of f(x) close to x=0.
Edit: More to the point, if I say “g(f(x)) goes to zero as f(x) goes to infinity”, then f(x) better not be (known to be) bounded above.