Shortcuts With Chained Probabilities

jefftkFeb 18, 2021, 2:00 AM

15 points

Let’s say you’re considering an activity with a risk of death of one in a million. If you do it twice, is your risk two in a million?

Technically, it’s just under:

1 - (1 − 1/1,000,000)^2 = ~2/1,000,001

This is quite close! Approximating 1 - (1-p)^2 as p*2 was only off by 0.00005%.

On the other hand, say you roll a die twice looking for a 1:

1 - (1 − 1/6)^2 = ~31%

The approximation would have given:

1/6 * 2 = ~33%

Which is off by 8%. And if we flip a coin looking for a tails:

1/2 * 2 = 100%

Which is clearly wrong since you could get heads twice in a row.

It seems like this shortcut is better for small probabilities; why?

If something has probability p, then the chance of it happening at least once in two independent tries is:

1 - (1-p)^2
 = 1 - (1 − 2p + p^2)
 = 1 − 1 + 2p—p^2
 = 2p—p^2

If p is very small, then p^2 is negligible, and 2p is only a very slight overestimate. As it gets larger, however, skipping it becomes more of a problem.

This is the calculation that people do when adding micromorts: you can’t die from the same thing multiple times, but your chance of death stays low enough that the inaccuracy of naively combining these probabilities is much smaller than the margin of error on our estimates.

Comment via: facebook

jefftkFeb 18, 2021, 2:00 AM

15 points

6 comments1 min readLW link

Practical Rationality

[ ]

[deleted]
- Measure Feb 18, 2021, 11:28 AM
  6 points
  Parent
  
  
  The naive approximation gives 100% chance of death for both options, but we know it’s less accurate for larger probabilities, so that should mean the two 50% risks is safer. In fact, 1 - (1 − 0.5)^2 = 75% is actually larger than 1 - (1 − 0.05)^20 = 64%. This means that the naive approximation is also bad at numerous iterations (large exponents).
- jefftk Feb 18, 2021, 7:35 PM
  4 points
  Parent
  
  With such a high chance of death I’m not using any approximations!
```
>>> (1-1/20)**20
0.36
>>> (1-1/2)**2
0.25
```
  - Jon Zero Feb 18, 2021, 7:59 PM
    5 points
    Parent
    
    If you’re being rushed into a decision, you don’t want to calculate at all! Just use the well-known fact that $(1 - \frac{1}{n})^{n}$ is an increasing function of $n > 1$ (with limit $e^{- 1}$ , as it happens).
- just_browsing Feb 18, 2021, 9:46 PM
  0 points
  Parent
  
  The intuitive way to think about this is the heuristic “small numbers produce more extreme outcomes”. Both choices have the same expected number of deaths. But the 50% option is higher variance than the 5% option. Our goal is to maximize the likelihood of getting the “0 deaths” outcome, which is an extreme outcome relative to the mean. So we can conclude the 50% option is better without doing any math.
  - philh Feb 22, 2021, 12:33 PM
    2 points
    Parent
    
    You got the wrong answer, but I do like the idea of comparing variances, and at least for this distribution, whichever has greater variance will have more weight on 0. But in this case, the variance of the 50% option is 0.5 and the variance of the 5% option is 0.95. And indeed the 5% option is preferable. ( $B i n o m i a l (n, p)$ has variance $n p (1 - p)$ , if the means $n p$ are the same then whichever has lower $p$ will have higher variance.)
    - just_browsing Feb 22, 2021, 3:54 PM
      1 point
      Parent
      
      That’ll teach me to post without thinking! Yes, you’re right that $n p (1 - p)$ is the better way to deal with variance here. (Or honestly, the $(1 - \frac{1}{n})^{n}$ method from the above comment is the slickest way.)
      I had been thinking of a similar kind of situation, where you have a fixed $p$ and varying sample sizes $n$ . Then, the smaller $n$ gives more extreme outcomes than larger $n$ . Of course, this isn’t applicable here.