I believe that the analysis of this problem can be made more mathematically rigorous than is done in this post. Not only will a formal analysis help us avoid problem’s in our reasoning, but it will clearly illustrate what assumptions have been made (so we can question their legitimacy).
Let’s assume (as is done implicitly in the post) that you know with 100% certainty that the only two possible payouts are $1 million and $0. Then:
expected earnings = p($1 million payout) $1 million + p($0 payout) $0 - (ticket price)
= p($1 million payout) * $1 million - (ticket price)
= p($1 million payout|correctly computed odds) p(correctly computed odds) * $1 million
p($1 million payout|incorrectly computed odds) p(incorrectly computed odds) * $1 million
(ticket price)
= (1/40,000,000) p(correctly computed odds) * $1 million
p($1 million payout|incorrectly computed odds) (1 - p(correctly computed odds)) * $1 million
(ticket price)
We note now that we can write:
p($1 million payout|incorrectly computed odds) (1 - p(correctly computed odds)) $1 million
= p($1 million payout|incorrectly computed odds) $1 million (1 - p(correctly computed odds))
= (p($1 million payout|incorrectly computed odds) $1 million + p($0 payout|incorrectly computed odds) $0) (1 - p(correctly computed odds))
= (expected payout given incorrectly computed odds) (1 - p(correctly computed odds))
Hence, our resulting equation is:
expected earnings = (1/40,000,000) p(correctly computed odds) * $1 million
Now, under the fairly reasonable (but not quite true) assumption (which seems to be implicitly made by the author) that
(expected payout given incorrectly computed odds) = (expected payout given that we know nothing except that we are dealing with a lotto that costs (ticket price) to play)
we can convert to the notation of the article, which gives us:
E(L) = p(C) p(L) j + (1 - p(C)) * (e + t) - t
Here I have interpreted e as the expected value given that we are dealing with a lotto that we know nothing else about (rather than expected earnings under those circumstances). The author describes e as an “expected payoff” but I don’t think that is really quite what was meant (unless “payoff” returns to total net payoff including the ticket price).
which finally gets us to the author’s terminal formula.
What is the point of doing this careful, formal analysis? Well, we now see where the author’s formula comes from explicitly, it is proven rigorously, and we are fully aware of what assumptions were made. The assumptions are:
You know with 100% certainty that the only two possible payouts are $1 million and $0
and
expected payout given incorrectly computed odds = expected payout given that we know nothing except that we are dealing with a lotto that costs the given ticket price to play
The first assumption is reasonable assuming that lotto is not fraudulent, you don’t have problems reading the rules, it is not possible for multiple people to claim the payout, etc.
The second assumption, however, is harder to justify. There are many ways that a calculation of odds could go wrong (putting a decimal point in the wrong place, making a multiplication error, unknowingly misunderstanding the laws of probability, actually being insane, etc.) If we could really enumerate all of them, understand how they effect our computed payout probability, and estimate the probability of each occurring, then we could compute this missing factor exactly. As things stand though, it is probably untenable. It should not be expected though that errors that make the payout probability artificially larger will balance those that make it artificially smaller. Misplacing a decimal point, for example, will almost certainly be noticed if it leads to a percentage greater than 100%, but not if it leads to one that is less than that (creating an asymmetry).
The second assumption, however, is harder to justify. There are many ways that a calculation of odds could go wrong (putting a decimal point in the wrong place, making a multiplication error, unknowingly misunderstanding the laws of probability, actually being insane, etc.) If we could really enumerate all of them, understand how they effect our computed payout probability, and estimate the probability of each occurring, then we could compute this missing factor exactly. As things stand though, it is probably untenable. It should not be expected though that errors that make the payout probability artificially larger will balance those that make it artificially smaller. Misplacing a decimal point, for example, will almost certainly be noticed if it leads to a percentage greater than 100%, but not if it leads to one that is less than that (creating an asymmetry).
This is a valid point, and one I missed in my writeup. (Toby_Ord said something similar, but that was in response to a specific question.)
It is probably a useful skill to recognize asymmetries in the possible direction of error, such as that which you pointed out. I can see two ways to handle this:
a. Additional terms in the derivation, such as P(decimal-point error) and P(sign error), with the e term restricted to the unanticipated-error case. b. Modification of e.
I believe that the analysis of this problem can be made more mathematically rigorous than is done in this post. Not only will a formal analysis help us avoid problem’s in our reasoning, but it will clearly illustrate what assumptions have been made (so we can question their legitimacy).
Let’s assume (as is done implicitly in the post) that you know with 100% certainty that the only two possible payouts are $1 million and $0. Then:
expected earnings = p($1 million payout) $1 million + p($0 payout) $0 - (ticket price)
= p($1 million payout) * $1 million - (ticket price)
= p($1 million payout|correctly computed odds) p(correctly computed odds) * $1 million
p($1 million payout|incorrectly computed odds) p(incorrectly computed odds) * $1 million
(ticket price)
= (1/40,000,000) p(correctly computed odds) * $1 million
p($1 million payout|incorrectly computed odds) (1 - p(correctly computed odds)) * $1 million
(ticket price)
We note now that we can write:
p($1 million payout|incorrectly computed odds) (1 - p(correctly computed odds)) $1 million = p($1 million payout|incorrectly computed odds) $1 million (1 - p(correctly computed odds)) = (p($1 million payout|incorrectly computed odds) $1 million + p($0 payout|incorrectly computed odds) $0) (1 - p(correctly computed odds)) = (expected payout given incorrectly computed odds) (1 - p(correctly computed odds))
Hence, our resulting equation is:
expected earnings = (1/40,000,000) p(correctly computed odds) * $1 million
(expected payout given incorrectly computed odds) (1 - p(correctly computed odds))
(ticket price)
Now, under the fairly reasonable (but not quite true) assumption (which seems to be implicitly made by the author) that
(expected payout given incorrectly computed odds) = (expected payout given that we know nothing except that we are dealing with a lotto that costs (ticket price) to play)
we can convert to the notation of the article, which gives us:
E(L) = p(C) p(L) j + (1 - p(C)) * (e + t) - t
Here I have interpreted e as the expected value given that we are dealing with a lotto that we know nothing else about (rather than expected earnings under those circumstances). The author describes e as an “expected payoff” but I don’t think that is really quite what was meant (unless “payoff” returns to total net payoff including the ticket price).
We can now rearrange this formula:
E(L) = p(C) p(L) j + (1 - p(C)) e + (1 - p(C)) t—t = p(C) p(L) j + (1 - p(C)) e + (1 - p(C)) t—t = p(C) p(L) j + (1 - p(C)) e—p(C) t = p(C) ( p(L) j—t) + (1 - p(C)) e
which finally gets us to the author’s terminal formula.
What is the point of doing this careful, formal analysis? Well, we now see where the author’s formula comes from explicitly, it is proven rigorously, and we are fully aware of what assumptions were made. The assumptions are:
You know with 100% certainty that the only two possible payouts are $1 million and $0
and
expected payout given incorrectly computed odds = expected payout given that we know nothing except that we are dealing with a lotto that costs the given ticket price to play
The first assumption is reasonable assuming that lotto is not fraudulent, you don’t have problems reading the rules, it is not possible for multiple people to claim the payout, etc.
The second assumption, however, is harder to justify. There are many ways that a calculation of odds could go wrong (putting a decimal point in the wrong place, making a multiplication error, unknowingly misunderstanding the laws of probability, actually being insane, etc.) If we could really enumerate all of them, understand how they effect our computed payout probability, and estimate the probability of each occurring, then we could compute this missing factor exactly. As things stand though, it is probably untenable. It should not be expected though that errors that make the payout probability artificially larger will balance those that make it artificially smaller. Misplacing a decimal point, for example, will almost certainly be noticed if it leads to a percentage greater than 100%, but not if it leads to one that is less than that (creating an asymmetry).
This is a valid point, and one I missed in my writeup. (Toby_Ord said something similar, but that was in response to a specific question.)
It is probably a useful skill to recognize asymmetries in the possible direction of error, such as that which you pointed out. I can see two ways to handle this:
a. Additional terms in the derivation, such as P(decimal-point error) and P(sign error), with the e term restricted to the unanticipated-error case.
b. Modification of e.