Suppose that a coin has been tossed and come to rest, but I have not looked at it yet. What probability should I assign to the outcomes of heads and tails? It seems to me your analysis would say that the coin actually has either a 100% probability of being heads or a 100% probability of being tails, not a 50% probability of each.
This contradicts the entire idea of probability as being a measure of one’s own imperfect information. Is this your intention? If not, what is the difference between the assignment of probability to the unobserved but already existing coin toss and to the unobserved but already existing amount in the envelope?
Thank you for responding. This is indeed a very tricky issue, and I was looking for a sounding board… anyone who could challenge me in order to help me to clarify my explanation. I didn’t expect so many haters in this forum, but the show must go on with or without them.
My undergraduate degree is in math, and mathematicians sometimes use the phrase “without loss of generality” (WLOG). Every once in a while they will make a semi-apologetic remark about the phrase because they all know that, if it were ever to be used in an inappropriate way, then everything could fall apart. Appealing to WLOG is not a cop-out but rather an attempt to tell those who are evaluating the proof, “Tell me if I’m wrong.”
In your example of a coin flip, I can find no loss of generality. However, in the two envelopes problem, I can. If step (1) of the argument had said “unselected envelope” rather than “selected envelope”, then the argument would have led the player to choose to keeping the selected argument rather than switching it. Why should the argument using the words “selected envelope” be more persuasive than the argument involving the words “unselected envelope”? Do you see what I mean? There is an implicit “WLOG” but, in this case, with an actual loss in generality.
This problem still leaves me feeling very troubled because, even to the extent that I understand the fallacy, it still seems very difficult for me to know whether I have explained it in a way that leaves absolutely no room for confusion (which is very rare when I see an actual error in somebody’s reasoning). And apparently, I was not able to explain the fallacy in a way that others could understand. As far as I’m concerned, that’s a sign of a very dangerous fallacy. And I’ve encountered some very deep and dangerous fallacies. So, this one is still quite disturbing to me.
To follow a maxim of Edwin Jaynes, when a paradox arises in matters of probability, one must consider the generating process from which the probabilities were derived.
How does the envelope-filler choose the amount to put in either envelope? He cannot pick an “arbitrary” real number. Almost all real numbers are so gigantic as to be beyond human comprehension. Let us suppose that he has a probability distribution over the non-negative reals from which he draws a single value x, and puts x into one envelope and 2x into the other. (One could also imagine that he puts x/2 into the other, or tosses a coin to decide between 2x and x/2, but I’ll stick with this method.)
Any such probability distribution must tail off to zero as x becomes large. Suppose the envelope-chooser is allowed to open the first envelope, and then is allowed to switch to the other one if they think it’s worth switching. The larger the value they find in the first envelope, the less likely it is that the other envelope has twice as much. Similarly, if they find a very small value in the first envelope (i.e. well into the lower tail of the distribution), then they can expect to profit by switching.
In the original version, of course, they do not see what is in the envelope before deciding whether to switch. So we must consider the expected value of switching conditional on the value in the first envelope, summed or integrated over the probability distribution of what is in that envelope.
I shall work this through with an example probability distribution. Suppose that the probability of the chosen value being xn=2n is pn=2⋅3−n for all positive integers n, and no other value of x is possible. (Taking pn=2−n would be simpler, but that distribution has infinite expected value, which introduces its own paradoxes.)
I shall list all the possible ways the game can play out.
1. $2 in the envelope in your hand, $4 in the other. Probability p1=2/3 for selecting the value x1=2, and 1/2 for picking up the envelope containing x, so 1/3. Value of switching is x2−x1=x1, so the contribution of this possibility to the expected value of switching is (p1/2)x1=2/3.
2. $4 in your hand, $2 in the other. Probability p1/2=1/3, value of switching =−x1, expectation −(p1/2)x1=−2/3.
3. $4 in your hand, $8 in the other. Probability p2/2=1/9, value of switching =x2=4, expectation (p2/2)x2=4/9.
4. $8 in your hand, $4 in the other. Probability p2/2=1/9, value of switching =−x2=−4, expectation −(p2/2)x2=−4/9.
And so on. Now, we can pair these up as 2/3−2/3, 4/9−4/9, 8/27−8/27, etc. and see that the expected value of switching without knowledge of the first envelope’s contents is zero. But that is just the symmetry argument against switching. To dissolve the paradoxical argument that says that you should always switch, we pair up the outcomes according to the value in the first envelope.
If it has $2, the value of switching is 2/3.
If it has $4, the value is −2/3+4/9=−2/9.
If it has $8, the value is −4/9+8/27=−4/27.
The sum of all of the negative terms is −2/3, cancelling out the positive one. The expected value is zero.
The general term in this sum is, for k≥2, −(pk−1/2)xk−1+(pk/2)xk= −(3/4)pkxk+(1/2)pkxk=−(1/4)pkxk, which is negative. The value conditional on the value xk having been drawn is just this divided by (pk−1+pk)/2, which leaves it still negative. If we write α=pk−1/(pk−1+pk) and β=pk/(pk−1+pk), this works out to α=3/4 and β=1/4. The expected value given x=xk is then −αxk−1+βxk. Observe how this weights the negative value three times as heavily as the positive value, but the positive value is only twice as large.
Compare with the argument for switching, which instead computes the expected value as (−xk−1+xk)/2=(1/4)xk, which is positive. It is neglect of the distribution from which x was drawn that leads to this wrong calculation.
I worked this through for just one distribution, but I expect that a general proof can be given, at least for all distributions for which the expected value of x is finite.
Thanks for offering that solution. It seems appropriate to me. I think that the issue at stake is related to the difference in programming language semantics between a probabilistic and nondeterministic semantics. Once you have decided on a nondeterministic semantics, you can’t simply start adding in probabilities and expect it to make sense. So, your solution suggests that we should have had grounded the entire problem in a probability distribution, whereas I was saying that, because we hadn’t done that, we couldn’t legitimately add probabilities into the picture at a later step. I wasn’t ruling out the possibility of a solution like yours, and it would indeed be interesting to know whether yours can be generalized in any way. In a prior draft of this post, I actually suggested that we could introduce a random variable before the envelope was chosen (although I hadn’t even attempted to work out the details). It was only for the sake of brevity that I omitted suggesting that idea.
My interest is more in the philosophy of language and how language can be deceptive — which is clearly happening in some way in statement of this problem — and what we can do to guard ourselves against that. What bothers me is that, even when I claimed to have spotted where where and how the false step occurred, nobody wanted to believe that I spotted it, or at least they they didn’t believe that it mattered. That’s rather disturbing to me because this problem involves a relatively simple use of language. And I think that humans are in a bit a trouble if we can’t even get on the same page about something this simple… because we’ve got very serious problems right now in regard to A.I. that are much more complicated and tricky than this to deal with than this one.
But I do like your solution, and I’m glad that it’s documented here if nowhere else.
And for anyone who reads this, I apologize if the tone of my post was off-putting. I deliberately chose a slightly provocative title simply to draw attention to this post. I don’t mind being corrected if I’m mistaken or have misspoken.
Take any probability distribution defined on the set of all values (x,y) where x and y are non-negative reals and y=2x. It can be discrete, continuous, or a mixture.
Let p(x) be the marginal distribution over x. This method of defining p avoids the distinction between choosing x and then doubling it, or choosing y and then halving it, or any other method of choosing (x,y) such that y=2x.
Assume that p has an expected value, denoted by E(p).
The expected value of switching when the amount in the first envelope is in the range [x−dx/2,x+dx/2] consists of two parts:
(i) The first envelope contains the smaller amount. This has probability p(x)dx/2+O(dx2). The division by 2 comes from the 50% chance of choosing the envelope with the smaller amount.
(ii) The first envelope contains the larger amount. This has probability p(x/2)dx/4+O(dx2). The extra factor of 2 comes from the fact that when the contents are in an interval of length dx, half of that (the amount chosen by the envelope-filler) is in an interval of length dx/2.
In the two cases the gain from switching is respectively x+O(dx) or −x/2+O(dx).
The expected gain given the contents is therefore xp(x)/2−(x/2)p(x/2)/4+O(dx).
Multiply this by dx, let dx tend to 0 (eliminating the term in O(dx2)) and integrate over the real line:
∫∞0(xp(x)/2−(x/2)p(x/2)/4)dx
=∫∞0xp(x)/2dx−∫∞0(x/2)p(x/2)/4dx
The first integral is E(p)/2. In the second, substitute y=x/2 (therefore dx=2dy), giving ∫∞0yp(y)/2dy=E(p)/2. The two integrals cancel.
Suppose that a coin has been tossed and come to rest, but I have not looked at it yet. What probability should I assign to the outcomes of heads and tails? It seems to me your analysis would say that the coin actually has either a 100% probability of being heads or a 100% probability of being tails, not a 50% probability of each.
This contradicts the entire idea of probability as being a measure of one’s own imperfect information. Is this your intention? If not, what is the difference between the assignment of probability to the unobserved but already existing coin toss and to the unobserved but already existing amount in the envelope?
Thank you for responding. This is indeed a very tricky issue, and I was looking for a sounding board… anyone who could challenge me in order to help me to clarify my explanation. I didn’t expect so many haters in this forum, but the show must go on with or without them.
My undergraduate degree is in math, and mathematicians sometimes use the phrase “without loss of generality” (WLOG). Every once in a while they will make a semi-apologetic remark about the phrase because they all know that, if it were ever to be used in an inappropriate way, then everything could fall apart. Appealing to WLOG is not a cop-out but rather an attempt to tell those who are evaluating the proof, “Tell me if I’m wrong.”
In your example of a coin flip, I can find no loss of generality. However, in the two envelopes problem, I can. If step (1) of the argument had said “unselected envelope” rather than “selected envelope”, then the argument would have led the player to choose to keeping the selected argument rather than switching it. Why should the argument using the words “selected envelope” be more persuasive than the argument involving the words “unselected envelope”? Do you see what I mean? There is an implicit “WLOG” but, in this case, with an actual loss in generality.
This problem still leaves me feeling very troubled because, even to the extent that I understand the fallacy, it still seems very difficult for me to know whether I have explained it in a way that leaves absolutely no room for confusion (which is very rare when I see an actual error in somebody’s reasoning). And apparently, I was not able to explain the fallacy in a way that others could understand. As far as I’m concerned, that’s a sign of a very dangerous fallacy. And I’ve encountered some very deep and dangerous fallacies. So, this one is still quite disturbing to me.
To follow a maxim of Edwin Jaynes, when a paradox arises in matters of probability, one must consider the generating process from which the probabilities were derived.
How does the envelope-filler choose the amount to put in either envelope? He cannot pick an “arbitrary” real number. Almost all real numbers are so gigantic as to be beyond human comprehension. Let us suppose that he has a probability distribution over the non-negative reals from which he draws a single value x, and puts x into one envelope and 2x into the other. (One could also imagine that he puts x/2 into the other, or tosses a coin to decide between 2x and x/2, but I’ll stick with this method.)
Any such probability distribution must tail off to zero as x becomes large. Suppose the envelope-chooser is allowed to open the first envelope, and then is allowed to switch to the other one if they think it’s worth switching. The larger the value they find in the first envelope, the less likely it is that the other envelope has twice as much. Similarly, if they find a very small value in the first envelope (i.e. well into the lower tail of the distribution), then they can expect to profit by switching.
In the original version, of course, they do not see what is in the envelope before deciding whether to switch. So we must consider the expected value of switching conditional on the value in the first envelope, summed or integrated over the probability distribution of what is in that envelope.
I shall work this through with an example probability distribution. Suppose that the probability of the chosen value being xn=2n is pn=2⋅3−n for all positive integers n, and no other value of x is possible. (Taking pn=2−n would be simpler, but that distribution has infinite expected value, which introduces its own paradoxes.)
I shall list all the possible ways the game can play out.
1. $2 in the envelope in your hand, $4 in the other. Probability p1=2/3 for selecting the value x1=2, and 1/2 for picking up the envelope containing x, so 1/3. Value of switching is x2−x1=x1, so the contribution of this possibility to the expected value of switching is (p1/2)x1=2/3.
2. $4 in your hand, $2 in the other. Probability p1/2=1/3, value of switching = −x1, expectation −(p1/2)x1=−2/3.
3. $4 in your hand, $8 in the other. Probability p2/2=1/9, value of switching =x2=4, expectation (p2/2)x2=4/9.
4. $8 in your hand, $4 in the other. Probability p2/2=1/9, value of switching = −x2=−4, expectation −(p2/2)x2=−4/9.
And so on. Now, we can pair these up as 2/3−2/3, 4/9−4/9, 8/27−8/27, etc. and see that the expected value of switching without knowledge of the first envelope’s contents is zero. But that is just the symmetry argument against switching. To dissolve the paradoxical argument that says that you should always switch, we pair up the outcomes according to the value in the first envelope.
If it has $2, the value of switching is 2/3.
If it has $4, the value is −2/3+4/9=−2/9.
If it has $8, the value is −4/9+8/27=−4/27.
The sum of all of the negative terms is −2/3, cancelling out the positive one. The expected value is zero.
The general term in this sum is, for k≥2, −(pk−1/2)xk−1+(pk/2)xk =
−(3/4)pkxk+(1/2)pkxk = −(1/4)pkxk, which is negative. The value conditional on the value xk having been drawn is just this divided by (pk−1+pk)/2, which leaves it still negative. If we write α=pk−1/(pk−1+pk) and β=pk/(pk−1+pk), this works out to α=3/4 and β=1/4. The expected value given x=xk is then −αxk−1+βxk. Observe how this weights the negative value three times as heavily as the positive value, but the positive value is only twice as large.
Compare with the argument for switching, which instead computes the expected value as (−xk−1+xk)/2 = (1/4)xk, which is positive. It is neglect of the distribution from which x was drawn that leads to this wrong calculation.
I worked this through for just one distribution, but I expect that a general proof can be given, at least for all distributions for which the expected value of x is finite.
Thanks for offering that solution. It seems appropriate to me. I think that the issue at stake is related to the difference in programming language semantics between a probabilistic and nondeterministic semantics. Once you have decided on a nondeterministic semantics, you can’t simply start adding in probabilities and expect it to make sense. So, your solution suggests that we should have had grounded the entire problem in a probability distribution, whereas I was saying that, because we hadn’t done that, we couldn’t legitimately add probabilities into the picture at a later step. I wasn’t ruling out the possibility of a solution like yours, and it would indeed be interesting to know whether yours can be generalized in any way. In a prior draft of this post, I actually suggested that we could introduce a random variable before the envelope was chosen (although I hadn’t even attempted to work out the details). It was only for the sake of brevity that I omitted suggesting that idea.
My interest is more in the philosophy of language and how language can be deceptive — which is clearly happening in some way in statement of this problem — and what we can do to guard ourselves against that. What bothers me is that, even when I claimed to have spotted where where and how the false step occurred, nobody wanted to believe that I spotted it, or at least they they didn’t believe that it mattered. That’s rather disturbing to me because this problem involves a relatively simple use of language. And I think that humans are in a bit a trouble if we can’t even get on the same page about something this simple… because we’ve got very serious problems right now in regard to A.I. that are much more complicated and tricky than this to deal with than this one.
But I do like your solution, and I’m glad that it’s documented here if nowhere else.
And for anyone who reads this, I apologize if the tone of my post was off-putting. I deliberately chose a slightly provocative title simply to draw attention to this post. I don’t mind being corrected if I’m mistaken or have misspoken.
Here’s the general calculation.
Take any probability distribution defined on the set of all values (x,y) where x and y are non-negative reals and y=2x. It can be discrete, continuous, or a mixture.
Let p(x) be the marginal distribution over x. This method of defining p avoids the distinction between choosing x and then doubling it, or choosing y and then halving it, or any other method of choosing (x,y) such that y=2x.
Assume that p has an expected value, denoted by E(p).
The expected value of switching when the amount in the first envelope is in the range [x−dx/2,x+dx/2] consists of two parts:
(i) The first envelope contains the smaller amount. This has probability p(x)dx/2+O(dx2). The division by 2 comes from the 50% chance of choosing the envelope with the smaller amount.
(ii) The first envelope contains the larger amount. This has probability p(x/2)dx/4+O(dx2). The extra factor of 2 comes from the fact that when the contents are in an interval of length dx, half of that (the amount chosen by the envelope-filler) is in an interval of length dx/2.
In the two cases the gain from switching is respectively x+O(dx) or −x/2+O(dx).
The expected gain given the contents is therefore xp(x)/2−(x/2)p(x/2)/4+O(dx).
Multiply this by dx, let dx tend to 0 (eliminating the term in O(dx2)) and integrate over the real line:
∫∞0(xp(x)/2−(x/2)p(x/2)/4)dx
=∫∞0xp(x)/2dx−∫∞0(x/2)p(x/2)/4dx
The first integral is E(p)/2. In the second, substitute y=x/2 (therefore dx=2dy), giving ∫∞0yp(y)/2dy=E(p)/2. The two integrals cancel.