IlyaShpitser comments on Probabilities Small Enough To Ignore: An attack on Pascal’s Mugging

IlyaShpitser 16 Sep 2015 17:21 UTC
3 points
I like Scott Aaronson’s approach for resolving paradoxes that seemingly violate intuitions—see if the situation makes physical sense.

Like people bring up “blockhead,” a big lookup table that can hold an intelligent conversation with you for [length of time], and wonder whether this has ramifications for the Turing test. But blockhead is not really physically realizable for reasonable lengths.

Similarly for creating 10^100 happy lives, how exactly would you go about doing that in our Universe?
- Kaj_Sotala 16 Sep 2015 18:55 UTC
  11 points
  Parent
  
  Similarly for creating 10^100 happy lives, how exactly would you go about doing that in our Universe?
  
  By some alternative theory of physics that has a, say, .000000000000000000001 probability of being true.
  - IlyaShpitser 16 Sep 2015 21:14 UTC
    5 points
    Parent
    Right, the point is to throw away certain deals. I am suggesting another approach from the OP.
    
    The OP says: ignore deals involving small numbers. I say: ignore deals that violate physical intuitions (as they are). Where my heuristic differs from the OP is my heuristic is willing to listen to someone trying to sell me the Brooklyn bridge if I think the story fundamentally makes sense to me, given how I think physics ought to work. I am worried about long shot cases not forbidden by physics explicitly (which the OP will ignore if the shot is long enough). My heuristic will fail if humans are missing something important about physics, but I am willing to bet we are not at this point.
    
    In your example, the OP and I will both reject, for different reasons. I because it will violate my intuition and the OP because there is a small number involved.
    - solipsist 22 Sep 2015 2:22 UTC
      1 point
      Parent
      Relativity seems totally, insanely physically impossible to me. That doesn’t mean that taking a trillion to one bet on the Michelson Morley experiment wouldn’t have been a good idea.
      - IlyaShpitser 22 Sep 2015 17:25 UTC
        1 point
        Parent
        May I recommend Feynman’s lectures then? I am not sure what the point is. Aristotle was a smart guy, but his physics intuition was pretty awful. I think we are in a good enough state now that I am comfortable using physical principles to rule things out.
        
        Arguably quantum mechanics is a better example here than relativity. But I think a lot of what makes QM weird isn’t about physics but about the underlying probability theory being non-standard (similarly to how complex numbers are kinda weird). So, e.g. Bell violations say there is no hidden variable DAG model underlying QM—but hidden variable DAG models are defined on classical probabilities, and amplitudes aren’t classical probabilities. Our intuitive notion of “hidden variable” is somehow tied to classical probability.
        
        It all has to bottom out somewhere—what criteria do you use to rule out solutions? I think physics is in better shape today than basically any other empirical discipline.
        solipsist 23 Sep 2015 2:35 UTC
        0 points
        Parent
        Do you know, offhand, if Baysian networks have been extended with complex numbers as probabilities, or (reaching here) if you can do belief propagation by passing around qubits instead of bits? I’m not sure what I mean by either of these thing but I’m throwing keywords out there to see if anything sticks.
        IlyaShpitser 23 Sep 2015 4:27 UTC
        0 points
        Parent
        Yes they have, but there is no single generalization. I am not even sure what conditioning should mean.
        
        Scott A is a better guy to ask.
        solipsist 22 Sep 2015 21:23 UTC
        0 points
        Parent
        I don’t think the consensus of physicists is good enough for you to place that much faith in it. As I understand modern day cosmology, the consensus view holds that universe once grew by a factor of 10^78 for no reason. Would you pass up a 1 penny to $10,000,000,000 bet that cosmologists of the future will believe creating 10^100 happy humans is physically possible?
        
        what criteria do you use to rule out solutions?
        
        I don’t know :-(. Certainly I like physics as a darn good heuristic, but I don’t think I should reject bets with super-exponentially good odd based on my understanding of physics. A few bits of information from an expert would be enough to convince me that I’m wrong about physics, and I don’t think I should reject a bet with a payout better than 1 / the odds I will see those bits.
  - Pentashagon 19 Sep 2015 6:09 UTC
    −1 points
    Parent
    Which particular event has P = 10^-21? It seems like part of the pascal’s mugging problem is a type error: We have a utility function U(W) over physical worlds but we’re trying to calculate expected utility over strings of English words instead.
    
    Pascal’s Mugging is a constructive proof that trying to maximize expected utility over logically possible worlds doesn’t work in any particular world, at least with the theories we’ve got now. Anything that doesn’t solve reflective reasoning under probabilistic uncertainty won’t help against Muggings promising things from other possible worlds unless we just ignore the other worlds.
  - V_V 18 Sep 2015 13:49 UTC
    −2 points
    Parent
    I’d say that if you assign a 10^-22 probability to a theory of physics that allows somebody to create 10^100 happy lives depending on your action, then you doing physics wrong.
    
    If you assign probability 10^-(10^100) to 10^100 lives,10^-(10^1000) to 10^1000 lives, 10^-(10^10000) to 10^10000 lives, and so on, then you are doing physics right and you will not fall for Pascal’s Mugging.
    - Kaj_Sotala 19 Sep 2015 17:23 UTC
      1 point
      Parent
      There seems to be no obvious reason to assume that the probability falls exactly in proportion to the number of lives saved.
      
      If GiveWell told me they thought that real-life intervention A could save one life with probability PA and real-life intervention B could save a hundred lives with probability PB, I’m pretty sure that dividing PB by 100 would be the wrong move to make.
      - V_V 19 Sep 2015 21:51 UTC
        −1 points
        Parent
        
        There seems to be no obvious reason to assume that the probability falls exactly in proportion to the number of lives saved.
        
        It is an assumption to make asymptotically (that is, for the tails of the distribution), which is reasonable due to all the nice properties of exponential family distributions.
        
        If GiveWell told me they thought that real-life intervention A could save one life with probability PA and real-life intervention B could save a hundred lives with probability PB, I’m pretty sure that dividing PB by 100 would be the wrong move to make.
        
        I’m not implying that.
        
        EDIT:
        
        As a simple example, if you model the number of lives saved by each intervention as a normal distribution, you are immune to Pascal’s Muggings. In fact, if your utility is linear in the number of lives saved, you’ll just need to compare the means of these distributions and take the maximum. Black swan events at the tails don’t affect your decision process.
        
        Using normal distributions may be perhaps appropriate when evaluating GiveWell interventions, but for a general purpose decision process you will have, for each action, a probability distribution over possible future world state trajectories, which when combined with an utility function, will yield a generally complicated and multimodal distribution over utility. But as long as the shape of the distribution at the tails is normal-like, you wouldn’t be affected by Pascal’s Muggings.
        Kaj_Sotala 20 Sep 2015 11:38 UTC
        0 points
        Parent
        But it looks like the shape of the distributions isn’t normal-like? In fact, that’s one of the standard EA arguments for why it’s important to spend energy on finding the most effective thing you can do: if possible intervention outcomes really were approximately normally distributed, then your exact choice of an intervention wouldn’t matter all that much. But actually the distribution of outcomes looks very skewed; to quote The moral imperative towards cost-effectiveness:
        
        DCP2 includes cost-effectiveness estimates for 108 health interventions, which are presented in the chart below, arranged from least effective to most effective [...] This larger sample of interventions is even more disparate in terms of costeffectiveness. The least effective intervention analysed is still the treatment for Kaposi’s sarcoma, but there are also interventions up to ten times more cost-effective than education for high risk groups. In total, the interventions are spread over more than four orders of magnitude, ranging from 0.02 to 300 DALYs per $1,000, with a median of 5. Thus, moving money from the least effective intervention to the most effective would produce about 15,000 times the benefit, and even moving it from the median intervention to the most effective would produce about 60 times the benefit.
        
        It can also be seen that due to the skewed distribution, the most effective interventions produce a disproportionate amount of the benefits. According to the DCP2 data, if we funded all of these interventions equally, 80% of the benefits would be produced by the top 20% of the interventions. [...]
        
        Moreover, there have been health interventions that are even more effective than any of those studied in the DCP2. [...] For instance in the case of smallpox, the total cost of eradication was about $400 million. Since more than 100 million lives have been saved so far, this has come to less than $4 per life saved — significantly superior to all interventions in the DCP2.
        
        V_V 20 Sep 2015 12:46 UTC
        −1 points
        Parent
        I think you misunderstood what I said or I didn’t explain myself well: I’m not assuming that the DALY distribution obtained if you choose interventions at random is normal. I’m assuming that for each intervention, the DALY distribution it produces is normal, with an intervention-dependent mean and variance.
        
        I think that for the kind of interventions that GiveWell considers, this is a reasonable assumption: if the number of DALYs produced by each intervention is the result of a sum of many roughly independent variables (e.g. DALYs gained by helping Alice, DALYs gained by helping Bob, etc.) the total should be approximately normally distributed, due to the central limit theorem.
        
        For other types of interventions, e.g. whether to fund a research project, you may want to use a more general family of distributions that allows non-zero skewness (e.g. skew-normal distributions), but as long as the distribution is light-tailed and you don’t use extreme values for the parameters, you would not run into Pascal’s Mugging issues.
- VAuroch 16 Sep 2015 19:07 UTC
  2 points
  Parent
  It’s easy if they have access to running detailed simulations, and while the probability that someone secretly has that ability is very low, it’s not nearly as low as the probabilities Kaj mentioned here.
  - IlyaShpitser 16 Sep 2015 20:33 UTC
    2 points
    Parent
    It is? How much energy are you going to need to run detailed sims of 10^100 people?
    - Houshalter 17 Sep 2015 3:44 UTC
      3 points
      Parent
      How do you know you don’t exist in the matrix? And that the true universe above ours doesn’t have infinite computing power (or huge but bounded, if you don’t believe in infinity.) How do you know the true laws of physics in our own universe don’t allow such possibilities?
      
      You can say these things are unlikely. That’s literally specified in the problem. That doesn’t resolve the paradox at all though.
      - IlyaShpitser 17 Sep 2015 3:49 UTC
        1 point
        Parent
        I don’t know, but my heuristic says to ignore stories that violate sensible physics I know about.
        Houshalter 17 Sep 2015 5:27 UTC
        4 points
        Parent
        That’s fine. You can just follow your intuition, and that usually won’t lead you too wrong. Usually. However the issue here is programming an AI which doesn’t share our intuitions. We need to actually formalize our intuitions to get it to behave as we would.
        IlyaShpitser 17 Sep 2015 13:30 UTC
        2 points
        Parent
        What criterion do you use to rule out solutions?
      - V_V 18 Sep 2015 13:36 UTC
        −1 points
        Parent
        If you assume that the probability of somebody creating X lives decreases asymptotically as exp(-X) then you will not accept the deal. In fact, the larger the number they say, the less the expected utility you’ll estimate (assuming that your utility is linear in the number of lives).
        
        It seems to me that such epistemic models are natural. Pascal’s Mugging arises as a thought experiment only if you consider arbitrary probability distributions and arbitrary utility functions, which in fact may even cause the expectations to become undefined in the general case.
        Houshalter 19 Sep 2015 3:16 UTC
        0 points
        Parent
        
        If you assume that the probability of somebody creating X lives decreases asymptotically as exp(-X) then you will not accept the deal.
        
        I don’t assume this. And I don’t see any reason why I should assume this. It’s quite possible that there exist powerful ways of simulating large numbers of humans. I don’t think it’s likely, but it’s not literally impossible like you are suggesting.
        
        Maybe it even is likely. I mean the universe seems quite large. We could theoretically colonize it and make trillions of humans. By your logic, that is incredibly improbable. For no other reason than that it involves a large number. Not that there is any physical law that suggests we can’t colonize the universe.
        V_V 19 Sep 2015 11:17 UTC
        −1 points
        Parent
        
        I don’t think it’s likely, but it’s not literally impossible like you are suggesting.
        
        I’m not saying it’s literally impossible, I’m saying that its probability should decrease with the number of humans, faster than the number of humans.
        
        Maybe it even is likely. I mean the universe seems quite large. We could theoretically colonize it and make trillions of humans. By your logic, that is incredibly improbable. For no other reason than that it involves a large number.
        
        Not really. I said “asymptotically”. I was considering the tails of the distribution.
        We can observe our universe and deduce the typical scale of the stuff in it. Trillion of humans may not be very likely but they don’t appear to be physically impossible in our universe. 10^100 humans, on the other hand, are off scale. They would require a physical theory very different than ours. Hence we should assign to it a vanishingly small probability.
        Houshalter 19 Sep 2015 12:58 UTC
        0 points
        Parent
        
        I’m not saying it’s literally impossible
        
        1/3^^^3 is so unfathomably huge, you might as well be saying it’s literally impossible. I don’t think humans are confident enough to assign probabilities so low, ever.
        
        10^100 humans, on the other hand, are off scale. They would require a physical theory very different than ours. Hence we should assign to it a vanishingly small probability.
        
        I think EY had the best counter argument. He had a fictional scenario where a physicist proposed a new theory that was simple and fit all of our data perfectly. But the theory also implies a new law of physics that could be exploited for computing power, and would allow unfathomably large amounts of computing power. And that computing power could be used to create simulated humans.
        
        Therefore, if it’s true, anyone alive today has a small probability of affecting large amounts of simulated people. Since that has “vanishingly small probability”, the theory must be wrong. It doesn’t matter if it’s simple or if it fits the data perfectly.
        
        But it seems like a theory that is simple and fits all the data should be very likely. And it seems like all agents with the same knowledge, should have the same beliefs about reality. Reality is totally uncaring about what our values are. What is true is already so. We should try to model it as accurately as possible. Not refuse to believe things because we don’t like the consequences. That’s actually a logical fallacy.
        V_V 19 Sep 2015 14:55 UTC
        −1 points
        Parent
        
        1/3^^^3 is so unfathomably huge, you might as well be saying it’s literally impossible. I don’t think humans are confident enough to assign probabilities so low, ever.
        
        Same thing with numbers like 10^100 or 3^^^3.
        
        I think EY had the best counter argument. He had a fictional scenario where a physicist proposed a new theory that was simple and fit all of our data perfectly. But the theory also implies a new law of physics that could be exploited for computing power, and would allow unfathomably large amounts of computing power. And that computing power could be used to create simulated humans.
        
        EY can imagine all the fictional scenario he wants, this doesn’t mean that we should assign non-negligible probabilities to them.
        
        It doesn’t matter if it’s simple or if it fits the data perfectly.
        
        If.
        
        But it seems like a theory that is simple and fits all the data should be very likely. And it seems like all agents with the same knowledge, should have the same beliefs about reality. Reality is totally uncaring about what our values are. What is true is already so. We should try to model it as accurately as possible. Not refuse to believe things because we don’t like the consequences.
        
        If your epistemic model generates undefined expectations when you combine it with your utility function, then I’m pretty sure we can say that at least one of them is broken.
        
        EDIT:
        
        To expand: just because we can imagine something and give it a short English description, it doesn’t mean that it is simple in epistemical terms. That’s the reason why “God” is not a simple hypothesis.
        Houshalter 20 Sep 2015 2:25 UTC
        −1 points
        Parent
        
        EY can imagine all the fictional scenario he wants, this doesn’t mean that we should assign non-negligible probabilities to them.
        
        Not negligible, zero. You literally can not believe in an theory of physics that allows large amounts of computing power. If we discover that an existing theory like quantum physics allows us to create large computers, we will be forced to abandon it.
        
        If your epistemic model generates undefined expectations when you combine it with your utility function, then I’m pretty sure we can say that at least one of them is broken.
        
        Yes something is broken, but it’s definitely not our prior probabilities. Something like solomonoff induction should generate perfectly sensible predictions about the world. If knowing those predictions makes you do weird things, that’s a problem with your decision procedure. Not the probability function.
        V_V 20 Sep 2015 9:29 UTC
        0 points
        Parent
        
        Not negligible, zero.
        
        You seem to have a problem with very small probabilities but not with very large numbers. I’ve also noticed this in Scott Alexander and others. If very small probabilities are zeros, then very large numbers are infinities.
        
        You literally can not believe in an theory of physics that allows large amounts of computing power. If we discover that an existing theory like quantum physics allows us to create large computers, we will be forced to abandon it.
        
        Sure. But since we know no such theory, there is no a priori reason to assume it exists with non-negligible probability.
        
        Something like Solomonoff induction should generate perfectly sensible predictions about the world.
        
        Nope, it doesn’t. If you apply Solomonoff induction to predict arbitrary integers, you get undefined expectations.
        Expand this thread
        entirelyuseless 20 Sep 2015 12:34 UTC
        1 point
        Parent
        Solomonoff induction combined with an unbounded utility function gives undefined expectations. But Solomonoff induction combined with a bounded utility function can give defined expectations.
        
        And Solomonoff induction by itself gives defined predictions.
        V_V 20 Sep 2015 14:54 UTC
        0 points
        Parent
        
        Solomonoff induction combined with an unbounded utility function gives undefined expectations. But Solomonoff induction combined with a bounded utility function can give defined expectations.
        
        Yes.
        
        And Solomonoff induction by itself gives defined predictions.
        
        If you try to use it to estimate the expectation of any unbounded variable, you get an undefined value.
        entirelyuseless 20 Sep 2015 15:11 UTC
        1 point
        Parent
        Probability is a bounded variable.
        V_V 20 Sep 2015 19:20 UTC
        0 points
        Parent
        Yes, but I’m not sure I understand this comment. Why would you want to compute an expectation of a probability?
        Houshalter 21 Sep 2015 8:08 UTC
        0 points
        Parent
        Yes I understand that 3^^^3 is finite. But it’s so unfathomably large, it might as well be infinity to us mere mortals. To say an event has 1/3^^^3 is to say you are certain it will never happen, ever. No matter how much evidence you are provided. Even if the sky opens up and the voice of god bellows to you and says “ya its true”. Even if he comes down and explains why it is true to you, and shows you all the evidence you can imagine.
        
        The word “negligible” is obscuring your true meaning. There is a massive—no, unfathomable—difference between 1/3^^^3 and “small” numbers like 1/10^80 (1 divided by the number of atoms in the universe.)
        
        To use this method is to say there are hypotheses with relatively short descriptions which you will refuse to believe. Not just about muggers, but even simple things like theories of physics which might allow large amounts of computing power. Using this method, you might be forced to believe vastly more complicated and arbitrary theories that fit the data worse.
        
        If you apply Solomonoff induction to predict arbitrary integers, you get undefined expectations.
        
        Solomonoff inductions predictions will be perfectly reasonable, and I would trust them far more than any other method you can come up with. What you choose to do with the predictions could generate nonsense results. But that’s not a flaw with SI, but with your method.
    - VAuroch 17 Sep 2015 1:36 UTC
      0 points
      Parent
      Point, but not a hard one to get around.
      
      There is a theoretical lower bound on energy per computation, but it’s extremely small, and the timescale they’ll be run in isn’t specified. Also, unless Scott Aaronson’s speculative consciousness-requires-quantum-entanglement-decoherence theory of identity is true, there are ways to use reversible computing to get around the lower bounds and achieve theoretically limitless computation as long as you don’t need it to output results. Having that be extant adds improbability, but not much on the scale we’re talking about.