Kaj_Sotala comments on Probabilities Small Enough To Ignore: An attack on Pascal’s Mugging

Kaj_Sotala 16 Sep 2015 18:55 UTC
11 points

Similarly for creating 10^100 happy lives, how exactly would you go about doing that in our Universe?

By some alternative theory of physics that has a, say, .000000000000000000001 probability of being true.
- IlyaShpitser 16 Sep 2015 21:14 UTC
  5 points
  Parent
  Right, the point is to throw away certain deals. I am suggesting another approach from the OP.
  
  The OP says: ignore deals involving small numbers. I say: ignore deals that violate physical intuitions (as they are). Where my heuristic differs from the OP is my heuristic is willing to listen to someone trying to sell me the Brooklyn bridge if I think the story fundamentally makes sense to me, given how I think physics ought to work. I am worried about long shot cases not forbidden by physics explicitly (which the OP will ignore if the shot is long enough). My heuristic will fail if humans are missing something important about physics, but I am willing to bet we are not at this point.
  
  In your example, the OP and I will both reject, for different reasons. I because it will violate my intuition and the OP because there is a small number involved.
  - solipsist 22 Sep 2015 2:22 UTC
    1 point
    Parent
    Relativity seems totally, insanely physically impossible to me. That doesn’t mean that taking a trillion to one bet on the Michelson Morley experiment wouldn’t have been a good idea.
    - IlyaShpitser 22 Sep 2015 17:25 UTC
      1 point
      Parent
      May I recommend Feynman’s lectures then? I am not sure what the point is. Aristotle was a smart guy, but his physics intuition was pretty awful. I think we are in a good enough state now that I am comfortable using physical principles to rule things out.
      
      Arguably quantum mechanics is a better example here than relativity. But I think a lot of what makes QM weird isn’t about physics but about the underlying probability theory being non-standard (similarly to how complex numbers are kinda weird). So, e.g. Bell violations say there is no hidden variable DAG model underlying QM—but hidden variable DAG models are defined on classical probabilities, and amplitudes aren’t classical probabilities. Our intuitive notion of “hidden variable” is somehow tied to classical probability.
      
      It all has to bottom out somewhere—what criteria do you use to rule out solutions? I think physics is in better shape today than basically any other empirical discipline.
      - solipsist 23 Sep 2015 2:35 UTC
        0 points
        Parent
        Do you know, offhand, if Baysian networks have been extended with complex numbers as probabilities, or (reaching here) if you can do belief propagation by passing around qubits instead of bits? I’m not sure what I mean by either of these thing but I’m throwing keywords out there to see if anything sticks.
        IlyaShpitser 23 Sep 2015 4:27 UTC
        0 points
        Parent
        Yes they have, but there is no single generalization. I am not even sure what conditioning should mean.
        
        Scott A is a better guy to ask.
      - solipsist 22 Sep 2015 21:23 UTC
        0 points
        Parent
        I don’t think the consensus of physicists is good enough for you to place that much faith in it. As I understand modern day cosmology, the consensus view holds that universe once grew by a factor of 10^78 for no reason. Would you pass up a 1 penny to $10,000,000,000 bet that cosmologists of the future will believe creating 10^100 happy humans is physically possible?
        
        what criteria do you use to rule out solutions?
        
        I don’t know :-(. Certainly I like physics as a darn good heuristic, but I don’t think I should reject bets with super-exponentially good odd based on my understanding of physics. A few bits of information from an expert would be enough to convince me that I’m wrong about physics, and I don’t think I should reject a bet with a payout better than 1 / the odds I will see those bits.
- Pentashagon 19 Sep 2015 6:09 UTC
  −1 points
  Parent
  Which particular event has P = 10^-21? It seems like part of the pascal’s mugging problem is a type error: We have a utility function U(W) over physical worlds but we’re trying to calculate expected utility over strings of English words instead.
  
  Pascal’s Mugging is a constructive proof that trying to maximize expected utility over logically possible worlds doesn’t work in any particular world, at least with the theories we’ve got now. Anything that doesn’t solve reflective reasoning under probabilistic uncertainty won’t help against Muggings promising things from other possible worlds unless we just ignore the other worlds.
- V_V 18 Sep 2015 13:49 UTC
  −2 points
  Parent
  I’d say that if you assign a 10^-22 probability to a theory of physics that allows somebody to create 10^100 happy lives depending on your action, then you doing physics wrong.
  
  If you assign probability 10^-(10^100) to 10^100 lives,10^-(10^1000) to 10^1000 lives, 10^-(10^10000) to 10^10000 lives, and so on, then you are doing physics right and you will not fall for Pascal’s Mugging.
  - Kaj_Sotala 19 Sep 2015 17:23 UTC
    1 point
    Parent
    There seems to be no obvious reason to assume that the probability falls exactly in proportion to the number of lives saved.
    
    If GiveWell told me they thought that real-life intervention A could save one life with probability PA and real-life intervention B could save a hundred lives with probability PB, I’m pretty sure that dividing PB by 100 would be the wrong move to make.
    - V_V 19 Sep 2015 21:51 UTC
      −1 points
      Parent
      
      There seems to be no obvious reason to assume that the probability falls exactly in proportion to the number of lives saved.
      
      It is an assumption to make asymptotically (that is, for the tails of the distribution), which is reasonable due to all the nice properties of exponential family distributions.
      
      If GiveWell told me they thought that real-life intervention A could save one life with probability PA and real-life intervention B could save a hundred lives with probability PB, I’m pretty sure that dividing PB by 100 would be the wrong move to make.
      
      I’m not implying that.
      
      EDIT:
      
      As a simple example, if you model the number of lives saved by each intervention as a normal distribution, you are immune to Pascal’s Muggings. In fact, if your utility is linear in the number of lives saved, you’ll just need to compare the means of these distributions and take the maximum. Black swan events at the tails don’t affect your decision process.
      
      Using normal distributions may be perhaps appropriate when evaluating GiveWell interventions, but for a general purpose decision process you will have, for each action, a probability distribution over possible future world state trajectories, which when combined with an utility function, will yield a generally complicated and multimodal distribution over utility. But as long as the shape of the distribution at the tails is normal-like, you wouldn’t be affected by Pascal’s Muggings.
      - Kaj_Sotala 20 Sep 2015 11:38 UTC
        0 points
        Parent
        But it looks like the shape of the distributions isn’t normal-like? In fact, that’s one of the standard EA arguments for why it’s important to spend energy on finding the most effective thing you can do: if possible intervention outcomes really were approximately normally distributed, then your exact choice of an intervention wouldn’t matter all that much. But actually the distribution of outcomes looks very skewed; to quote The moral imperative towards cost-effectiveness:
        
        DCP2 includes cost-effectiveness estimates for 108 health interventions, which are presented in the chart below, arranged from least effective to most effective [...] This larger sample of interventions is even more disparate in terms of costeffectiveness. The least effective intervention analysed is still the treatment for Kaposi’s sarcoma, but there are also interventions up to ten times more cost-effective than education for high risk groups. In total, the interventions are spread over more than four orders of magnitude, ranging from 0.02 to 300 DALYs per $1,000, with a median of 5. Thus, moving money from the least effective intervention to the most effective would produce about 15,000 times the benefit, and even moving it from the median intervention to the most effective would produce about 60 times the benefit.
        
        It can also be seen that due to the skewed distribution, the most effective interventions produce a disproportionate amount of the benefits. According to the DCP2 data, if we funded all of these interventions equally, 80% of the benefits would be produced by the top 20% of the interventions. [...]
        
        Moreover, there have been health interventions that are even more effective than any of those studied in the DCP2. [...] For instance in the case of smallpox, the total cost of eradication was about $400 million. Since more than 100 million lives have been saved so far, this has come to less than $4 per life saved — significantly superior to all interventions in the DCP2.
        
        V_V 20 Sep 2015 12:46 UTC
        −1 points
        Parent
        I think you misunderstood what I said or I didn’t explain myself well: I’m not assuming that the DALY distribution obtained if you choose interventions at random is normal. I’m assuming that for each intervention, the DALY distribution it produces is normal, with an intervention-dependent mean and variance.
        
        I think that for the kind of interventions that GiveWell considers, this is a reasonable assumption: if the number of DALYs produced by each intervention is the result of a sum of many roughly independent variables (e.g. DALYs gained by helping Alice, DALYs gained by helping Bob, etc.) the total should be approximately normally distributed, due to the central limit theorem.
        
        For other types of interventions, e.g. whether to fund a research project, you may want to use a more general family of distributions that allows non-zero skewness (e.g. skew-normal distributions), but as long as the distribution is light-tailed and you don’t use extreme values for the parameters, you would not run into Pascal’s Mugging issues.