magfrump comments on Why We Can’t Take Expected Value Estimates Literally (Even When They’re Unbiased)

magfrump 18 Aug 2011 19:04 UTC
−3 points
I have been waiting for someone to formalize this objection to Pascal’s mugging for a long time, and I’m very happy that now that it’s been done it’s been done very well.
- Oscar_Cunningham 18 Aug 2011 21:27 UTC
  20 points
  Parent
  ???
  
  What precisely is the objection to Pascal’s Mugging in the post? Just that the probability for the mugger being able to deliver goes down with N? This objection has been given thousands of times, and the counter response is that the probability can’t go down fast enough to outweigh increase in utility. This is formalised here.
  - TruePath 30 Aug 2011 8:27 UTC
    3 points
    Parent
    The paper is really useless. The entire methodology of requiring some non-zero computable bound on the probability that the function with a given godel number will turn out to be correct is deeply flawed. The failure is really about the inability of a computable function to check if two Godel numbers code for the same function not about utilities and probability. Similarly insisting that the Utilities be bounded below by a computable function on the GODEL NUMBERS of the computable functions is unrealistic.
    
    Note that one implicitly expects that if you consider longer and longer sequences of good events followed by nothing the utility will continue to rise. They basically rule out all the reasonable unbounded utility functions by fiat by requiring the infinite sequence of good events to have finite utility.
    
    I mean consider the following really simply model. At each time step I either receive a 1 or a 0 bit from the environment. The utility is the number of consequtive 1′s that appear before the first 0. The probability measure is the standard coin flip measure. Everything is nice and every Borel set of outcomes has a well defined expected value but the utility function goes off to infinity and indeed is undefined on the infinite sequence of 1′s.
    
    Awful paper but hard for non-experts to see where it gets the model wrong.
    
    The right analysis is simply that we want a utility function that is L1 integrable on the space of outcomes with respect to the probability measure. That is enough to get rid of Pascal’s mugging.
  - multifoliaterose 18 Aug 2011 22:06 UTC
    3 points
    Parent
    The post’s argument is more substantive than that the probability for the mugger to deliver goes down with N. Did you read the section of the post titled “Pascal’s Mugging”? I haven’t read the de Blanc paper that you link to but I would guess that he doesn’t assume a (log)-normal prior for the effectiveness of actions and so doesn’t Bayesian adjust the quantities downward as sharply as the present post suggests that one should.
    - Oscar_Cunningham 18 Aug 2011 23:15 UTC
      2 points
      Parent
      The argument is that simple numbers like 3^^^3 should be considered much more likely than random numbers with a similar size, since they have short descriptions and so the mechanisms by which that many people (or whatever) hang in the balance are less complex. For instance you’re more likely to win a prize of $1,000,000 than $743,328 even though the former is larger. de Blanc considers priors of this form, of which the normal isn’t an example.
      - multifoliaterose 19 Aug 2011 0:00 UTC
        6 points
        Parent
        Surely an action is more likely to have an expected value of saving 3.2 lives than pi lives; the distribution of values of actions is probably not literally log normal partially for the reason that you just gave, but I think that a log-normal distribution is much closer to the truth than a distribution which assigns probabilities strictly by Kolmogorov complexity. Here I’d recur to my response to cousin it’s comment.
        khafra 19 Aug 2011 12:39 UTC
        5 points
        Parent
        
        Surely an action is more likely to have an expected value of saving 3.2 lives than pi lives
        
        I’m not so sure. Do you mean (3.2 lives|pi lives) to log(3^^^3) digits of precision? If you don’t, I think it misleads intuition to think about the probability of an action saving 3.2 lives, to two decimal places; vs. pi lives, to indefinite precision.
        
        I can’t think of any right now, but I feel like if I really put my creativity to work for long enough, I could think of more ways to save 3.14159265358979323846264 lives than 3.20000000000000000000000 lives.
        multifoliaterose 19 Aug 2011 16:51 UTC
        2 points
        Parent
        I meant 3.2 lives to arbitrary precision vs. pi lives to arbitrary precision. Anyway, my point was that there’s going to be some deviation from a log-normal distribution on account of contingent features of the universe that we live in (mathematical, physical, biological, etc.) but that probably a log-normal distribution is a closer approximation to the truth than what one would hope to come up with a systematic analysis of the complexity of the numbers involved.
      - Erebus 19 Aug 2011 8:30 UTC
        1 point
        Parent
        
        The argument is that simple numbers like 3^^^3 should be considered much more likely than random numbers with a similar size, since they have short descriptions and so the mechanisms by which that many people (or whatever) hang in the balance are less complex.
        
        Consider the options A = “a proposed action affects 3^^^3 people” and B = “the number 3^^^3 was made up to make a point”. Given my knowledge about the mechanisms that affect people in the real world and about the mechanisms people use to make points in arguments, I would say that the likelihood of A versus B is hugely in favor of B. This is because the relevant probabilities for calculating the likelihood scale (for large values and up to a first order approximation) with the size of the number in question for option A and the complexity of the number for option B. I didn’t read de Blanc’s paper further than the abstract, but from that and your description of the paper it seems that its setting is far more abstract and uninformative than the setting of Pascal’s mugging, in which we also have the background knowledge of our usual life experience.
        Peter_de_Blanc 21 Aug 2011 5:24 UTC
        0 points
        Parent
        The setting in my paper allows you to have any finite amount of background knowledge.
  - magfrump 19 Aug 2011 1:43 UTC
    2 points
    Parent
    I mean that using a probability distribution rather than just saying numbers clearly dispels a naive pascal’s mugging. I am open to the possibility that more heavily contrived Pascal’s Muggings may exist that can still exploit an unbounded utility function but I’ll read that paper and see what I think after that.
    
    Edit: From the abstract:
    
    The agent has a utility function on outputs from the environment. We show that if this utility function is bounded below in absolute value by an unbounded computable function, then the expected utility of any input is undefined. This implies that a computable utility function will have convergent expected utilities iff that function is bounded.
    
    What this sounds like it is saying is that literally any action under an unbounded utility function has undefined utility. In that case it just says that unbounded utility functions are useless from the perspective of decision theory. I’m not sure how it constitutes evidence that the problem of Pascal’s Mugging is unresolved.
  - Jiro 20 Jun 2013 3:58 UTC
    0 points
    Parent
    (Yes, I know this is an old post.)
    
    Suppose that the probability I assign to the mugger being able to deliver is equal to 1 / ((utility delivered if the mugger is telling the truth) ^ 2). Wouldn’t that be a probability that goes down fast enough to outweigh the increase in utility?
    - Oscar_Cunningham 20 Jun 2013 12:12 UTC
      0 points
      Parent
      I’m afraid that I don’t remember the details of the paper I linked to above, you’ll have to look at it to see why they don’t consider that a valid distribution (perhaps because the things that the mugger says have to be counted as evidence, and this can’t decrease that quickly for some reason? I’m afraid I don’t remember.)