Shmi comments on Expected utility, unlosing agents, and Pascal’s mugging

Shmi 28 Jul 2014 20:12 UTC
5 points
Rejecting components with extremely low probabilities even if they have extremely large utilities works better, avoids selection bias and is also what common sense would tell you to do. I think of it as “ignore noise”. At least it works for humans, an AGI may have to do something else.
- Squark 28 Jul 2014 20:30 UTC
  0 points
  Parent
  I don’t understand in what sense it works better or the relation to selection bias. Ignoring components with low probability is a good heuristic since you have bounded computing resources. However it seems reasonable to require your decision theory to deal with extreme cases in which you can spare the computing resource to take these components into account. On the other hand, I agree that it is rarely practical for humans to emulate perfect Bayesian reasoners.
  - Shmi 28 Jul 2014 20:54 UTC
    1 point
    Parent
    Selection bias: when you are presented with a specific mugging scenario, you ought to realize that there are many more extremely unlikely scenarios where the payoff is just as high, so selecting just one of them to act on is suboptimal.
    
    As for the level at which to stop calculating, bounded computational power is a good heuristic. But I suspect that there is a better way to detect the cutoff (it is known as infrared cutoff in physics). If you calculate the number of choices vs their (log) probability, once you go low enough, the number of choices explodes. My guess is that for reasonably low probabilities you would get exponential increase for the number of outcomes, but for very low probabilities the growth in the number of outcomes becomes super-exponential. This is, of course, a speculation, I would love to see some calculation or modeling, but this is what my intuition tells me.
    - Pentashagon 29 Jul 2014 3:29 UTC
      6 points
      Parent
      
      for very low probabilities the growth in the number of outcomes becomes super-exponential
      
      There can’t be more than 2^n outcomes each with probability 2^-n.
      - Shmi 29 Jul 2014 22:02 UTC
        5 points
        Parent
        OK, I have thought about it some more. The issue is how accurately one can evaluate the probabilities. If the best you can do is, say, 1%, then you are forced to count even the potentially very unlikely possibilities at 1% odds. The accuracy of the probability estimates would depend on something like the depth of your Solomonoff induction engine. If you are confronted with a Pascal’s mugger and your induction engine returns “the string required to model the mugger as honest and capable of carrying out the threat is longer than the longest algorithm I can process”, you are either forced to use the probability corresponding to the longest string, or to discard the hypothesis outright. What I am saying is that the latter is better than the former.
        Pentashagon 5 Aug 2014 5:47 UTC
        0 points
        Parent
        
        If you are confronted with a Pascal’s mugger and your induction engine returns “the string required to model the mugger as honest and capable of carrying out the threat is longer than the longest algorithm I can process”, you are either forced to use the probability corresponding to the longest string, or to discard the hypothesis outright.
        
        The primary problem with Pascal’s Mugging is that the Mugging string is short and easy to evaluate. 3^^^3 is a big number; it implies a very low probability but not necessarily 1 / 3^^^3; so just how outrageous can a mugging be without being discounted for low probability? That least-likely-but-still-manageable Mugging will still get you. If you’re allowed to reason about descriptions of utility and not just shut up and multiply to evaluate the utility of simulated worlds then in the worst case you have to worry about the Mugger that offers you BusyBeaver(N) utility, where 2^-N is the lowest probability that you can process. BusyBeaver(N) is well-defined, although uncomputable, it is at least as large as any other function of length N. Unfortunately that means BusyBeaver(N) 2^-N > C, for some N-bit constant C, or in other words EU(Programs-of-length-N) is O(BusyBeaver(N)). It doesn’t matter what the mugger offers, or if you mug yourself. Any N-bit utility calculation program has expected utility O(BB(N)) because it might* yield BB(N) utility.
        
        The best not-strictly-bounded-utility solution I have against this is discounting the probability of programs as a function of their running time as well as their length. Let 1/R be the probability that any given step of a process will cause it to completely fail as opposed to halting with output or never halting. Solomonoff Induction can be redefined as the sum over programs, P, producing an output S in N steps, of 2^-Length(P) x (R − 1 / R)^N. It is possible to compute a prior probability with error less than B, for any sequence S and finite R, by enumerating all programs shorter than log_2(1 / B) bits that halt in fewer than ~R / B steps. All un-enumerated programs have cumulative probability less than B of generating S.
        
        For Pascal’s Mugging it suffices to determine B based on the number of steps required to, e.g. simulate 3^^^3 humans. 3^^^3 ~= R / B, so either the prior probability of the Mugger being honest is infinitesimal, or it is infinitesimally unlikely that the universe will last fewer than the minimum 3^^^3 Planck Units necessary to implement the mugging. Given some evidence about the expected lifetime of the Universe, the mugging can be rejected.
        
        The biggest advantage of this method over a fixed bounded utility function is that R is parameterized by the agent’s evidence about its environment, and can change with time. The longer a computer successfully runs an algorithm, the larger the expected value of R.
        Shmi 5 Aug 2014 7:30 UTC
        0 points
        Parent
        OK, I guess I understand your argument that the mugger can construct an algorithm producing very high utility using only N bits. Or that I can construct a whole whack of similar algorithms in response. And end up unable to do anything because of the forest of low-probability high-utility choices. Which are known to be present if only you spend enough time looking for them. So that’s why you suggest limiting not (only) the number of states, but (also) the number of steps. I wonder what Eliezer and others think about that.
      - Shmi 29 Jul 2014 6:27 UTC
        4 points
        Parent
        You are right, of course. I have to rethink and rephrase it.
    - solipsist 29 Jul 2014 2:01 UTC
      0 points
      Parent
      This may be an impossible task, but can you elaborate on your intuition? My intuitions, such as they are, are Solomonoffy and do not predict super-exponential growth.
      - Shmi 29 Jul 2014 22:04 UTC
        2 points
        Parent
        I tried to elaborate in another comment, suggesting that we should reject all hypotheses more complicated than our Solomonoff engine can handle.