AlexMennen comments on Median utility rather than mean?

AlexMennen 10 Sep 2015 16:15 UTC
−1 points
You were claiming that in a situation where a median-maximizing agent has a large number of trivially inconvenient action that prevent small risks of death, heavy injury, or light injury, then it would accept a 49% chance of light injury, but you seemed to imply that it would not accept a 49% chance of death. I was trying to point out that this appears to be incorrect.
- Stuart_Armstrong 11 Sep 2015 8:30 UTC
  1 point
  Parent
  I’m not entirely sure what your objection is; we seem to be talking at cross purposes.
  
  Let’s try it simpler. If we assume that the cost of buckling seat belts is incommensurable (in practice) with light injury (and heavy injury, and death), then the median maximising agent will accept a 49.99..% chance of (light injury or heavy injury or death), over their lifetime. Since light injury is much more likely than death, this in effect forces the probability of death down to a very low amount.
  
  It’s just an illustration of the general point that median maximising seems to perform much better in real-world problems than its failure in simple theoretical ones would suggest.
  - AlexMennen 11 Sep 2015 16:27 UTC
    −3 points
    Parent
    
    Since light injury is much more likely than death, this in effect forces the probability of death down to a very low amount.
    
    No, it doesn’t. That does not address the fact that the agent will not preferentially accept light injury over death. Adopting a policy of immediately committing suicide once you’ve been injured enough to force you into the bottom half of outcomes does not decrease median utility. The agent has no incentive to prevent further damage once it is in the bottom half of outcomes. As a less extreme example, the value of house insurance to a median maximizer is 0, just because loosing your house is a bad outcome even if you get insurance money for it. This isn’t a weird hypothetical that relies on it being an isolated decision; it’s a real-life decision that a median maximizer would get wrong.
    - Stuart_Armstrong 14 Sep 2015 11:39 UTC
      0 points
      Parent
      A more general way of stating how multiple decisions improve median maximalisation: the median maximaliser is indifferent of outcomes not at the median (eg suicide vs light injury). But as the decision tree grows and the number of possible situations does as well, the probability increases that outcomes not at the median in a one shot, will affect the median in the more complex situation.
      - AlexMennen 14 Sep 2015 17:27 UTC
        0 points
        Parent
        This argument relies on your utility being a sum of effects from each of the decisions you made, but in reality, your decisions interact in much more complicated ways, so that isn’t a realistic model.
        
        Also, if your defense of median maximization consists entirely of an argument that it approximates mean maximization, then what’s the point of all this? Why not just use expected utility maximization? I’m expecting you to bring up Pascal’s mugging here, but since VNM-rationality does not force you to pay the mugger, you’ll have to do better than that.
        Stuart_Armstrong 15 Sep 2015 10:56 UTC
        0 points
        Parent
        
        This argument relies on your utility being a sum of effects from each of the decisions you made
        
        It doesn’t require that in the least. I don’t know if, eg, quadratic of higher order effects would improve or worsen the situation.
        
        but since VNM-rationality does not force you to pay the mugger
        
        The consensus at the moment seems to be that if you have unbounded utility, it does force you to pay some muggers. Now, I’m perfectly fine with bounding your utility to avoid muggers, but that’s the kind of non-independent decision some people don’t like ;-)
        
        The real problem is things like the Cauchy distribution, or any function without an expectation value at all. Saying “VNM works fine as long as we don’t face these difficult choices, then it breaks down” is very unsatisfactory. I’m also interested in seeing what happens when “expect to win” and “win in expectation” become quite distinct—a rare event, in practice.
        AlexMennen 15 Sep 2015 19:14 UTC
        0 points
        Parent
        
        It doesn’t require that in the least. I don’t know if, eg, quadratic of higher order effects would improve or worsen the situation.
        
        The more concrete argument you made previous does rely on it. If what you’re saying now doesn’t, then I guess I don’t understand it.
        
        Now, I’m perfectly fine with bounding your utility to avoid muggers, but that’s the kind of non-independent decision some people don’t like ;-)
        
        I don’t follow. Maximizing the expected value of a bounded utility functions does respect independence.
        Stuart_Armstrong 16 Sep 2015 11:24 UTC
        0 points
        Parent
        
        The more concrete argument you made previous does rely on it. If what you’re saying now doesn’t, then I guess I don’t understand it.
        
        That was an example. There’s another one in http://lesswrong.com/lw/1d5/expected_utility_without_the_independence_axiom/ which relies on “not risk loving”. That post doesn’t mention the median, but it does mention the standard deviation, and we know the mean must be within one SD of the mean (and often much closer).
        
        I don’t follow. Maximizing the expected value of a bounded utility functions does respect independence.
        
        Choosing to bound an unbounded utility function to avoid muggers does not.
        AlexMennen 16 Sep 2015 20:51 UTC
        0 points
        Parent
        
        That was an example. There’s another one in http://lesswrong.com/lw/1d5/expected_utility_without_the_independence_axiom/
        
        That example also relies on your utility being the sum of components that are determined from your various actions.
        
        Choosing to bound an unbounded utility function to avoid muggers does not.
        
        To be clear, I was not suggesting that you have an unbounded utility function that it would make sense for you to maximize if it weren’t for Pascal’s mugger, so you should bound it when there might be a Pascal’s mugger around. I was suggesting that the utility function it makes sense for you to maximize is bounded. Unbounded utility functions are so loony they never should have been seriously considered in the first place; Pascal’s mugger is merely a dramatic illustration of that fact.
        
        Edit: I probably shouldn’t rely on the theoretical reasons to prefer bounded utility functions, since they are not completely airtight and actual human preferences are more important anyway. So let’s look at actual human preferences. Suppose you’ve got a rational agent with preference relation “<”, and you want to test whether its utility function is bounded or unbounded. Here’s a simple test: First find outcomes A and B such that A<B (if you can’t even do that, its utility function is constant, hence bounded). Then pick an absurdly tiny probability p>0. Now see if you can find such a terrible C and such a wonderful D that pC+(1-p)B < pD + (1-p)A. If, for every p>0 you can find such C and D, then its utility function is unbounded. But if for some p>0, you cannot find any C and D that will suffice, even when you probe the extremes of goodness and badness, then its utility function is bounded. This test should sound familiar. What I’m getting at here is that one does not bound their unbounded utility function so that they don’t have to pay Pascal’s mugger; your preferences were simply bounded all along, and your response to Pascal’s mugger is proof.
    - Stuart_Armstrong 14 Sep 2015 11:37 UTC
      0 points
      Parent
      Look, we’re arguing past each other here. My logical response here would be to add more options to the system, which would remove the problem you identified (and I don’t understand your house insurance example—this is just the seat-belt decision again as a one-shot, and I would address it by looking at all the financial decisions you make in your life—and if that’s not enough, all the decisions, including all the “don’t do something clearly stupid and pointless” ones).
      
      What I think is clear is:
      
      a) Median maximalisation makes bad decisions in isolated problems.
      
      b) If we combine all the likely decisions that a median maximiser will have to make, the quality of the decisions increase.
      
      If you want to argue against it, either say that a) is bad enough we should reject the approach anyway, even if it decides well in practice, or find examples where a real world median maximaliser will make bad decisions even in the real world (if you would pay Pascal’s mugger, then you could use that as an example).
      - AlexMennen 14 Sep 2015 17:11 UTC
        0 points
        Parent
        
        I don’t understand your house insurance example—this is just the seat-belt decision again as a one-shot
        
        We were modeling the seat-belt decision as something that makes the difference between being dead and being completely fine in the event of an accident (which I suppose is not very realistic, but whatever). I was trying to point to a situation where an event can happen which is bad enough to put in the bottom half of outcomes either way, so that nothing that happens conditional on the event can affect the median outcome, but a decision you can make ahead of time would make the difference between bad and worse.
        
        I do think that a) is bad enough, because a decision procedure that does poorly in isolated problems is wrong, and thus cannot be expected to do well in realistic situations, as I mentioned previously. I guess b) is probably technically true, but it is not enough for the quality of the decisions to increase when the number increases; it should actually increase towards a limit that isn’t still awful, and come close to achieving that limit (I’m pretty sure it fails on at least one of those, though which step it fails on might depend on how you make things precise). I’ve given examples where median maximizers make bad decisions in the real world, but you’ve dismissed them with vague appeals to “everything will be fine when you consider it in the context of all the other decisions it has to make”.
        Stuart_Armstrong 15 Sep 2015 11:23 UTC
        0 points
        Parent
        
        I’ve given examples where median maximizers make bad decisions in the real world, but you’ve dismissed them with vague appeals to “everything will be fine when you consider it in the context of all the other decisions it has to make”.
        
        And I’ve added in the specific other decisions needed to achieve this effect. I agree it’s not clear what exactly the median maximalisation converge on in the real world, but the examples you’ve produced are not sufficient to show it’s bad.
        
        I do think that a) is bad enough, because a decision procedure that does poorly in isolated problems is wrong
        
        My take on this is that counterfactual decision count as well. ie if humans look not only at the decisions they face, but the ones they can imagine facing, then median maximalisation is improved. My justification for this line of thought is—how do you know that one chocolate cake is +10 utility while one coffee is +2 (and two coffees is +3, three is +2, and four is −1)? Not just the ordinal ranking, but the cardinality. I’d argue that you get this by either experiencing circumstances where you choose a 20% chance of a cake over coffee, or imagining yourself in that circumstance. And if imagination and past experiences are valid for the purpose of constructing your utility function, they should be valid for the purpose of median-maximalisation.
        AlexMennen 15 Sep 2015 20:23 UTC
        0 points
        Parent
        
        And I’ve added in the specific other decisions needed to achieve this effect.
        
        That you claim achieve that effect. But as I said, unless the are choices you can make that would protect you from light injury involve less inconvenience per % reduction in risk than the choices you can make that would protect you from death, it doesn’t work.
        
        However, I did think of something which seems to sort of achieve what you want: if you have high uncertainty about what the value of your utility function will be, then adding something to it with some probability will have a significant effect on the median value, even if the probability is significantly less than 50%. For instance, a 49% chance of death is very bad because if there’s a 49% chance you die, then the median outcome is one in which you’re alive but in a worse situation than all but ¹⁄₅₁ of the scenarios in which you die. It may be that this is what you had in mind, and adding future decisions that involve uncertainty was merely a mechanism by which large uncertainty about the outcome was introduced, in which case future-you actually getting to make any choices about them was a red herring. I still don’t find this argument convincing either, though, both because it still undervalues protection from risks of losses that are large relative to the rest your uncertainty about the value of the outcome (for instance, note that when valuing reductions in risk of death, there is still a weird discontinuity around 50%), and because it assumes that you can’t make decisions that selectively have significant consequences only in very good or very bad outcomes (this is what I was getting at with the house insurance example).
        
        My take on this is that counterfactual decision count as well. … And if imagination and past experiences are valid for the purpose of constructing your utility function, they should be valid for the purpose of median-maximalisation.
        
        I don’t understand what you’re saying here. Is it that you can maximize the median value of the mean of the values of your utility function in a bunch of hypothetical scenarios? If so, that sounds kind of like Houshalter’s median of means proposal, which approaches mean maximization as the number of samples considered approaches infinity.
        Stuart_Armstrong 16 Sep 2015 11:29 UTC
        0 points
        Parent
        
        I don’t understand what you’re saying here.
        
        The observation I have is that when facing many decisions, median maximialisation tends to move close to mean maximalisation (since the central limit theorem has “convergence in the distribution”, the median will converge to the mean in the case of averaging repeated independent processes; but there are many other examples of this). Therefore I’m considering what happens if you add “all the decisions you can imagine making” to the set of actual decisions you expect to make. This feels like it should move the two even closer together.
        AlexMennen 16 Sep 2015 22:41 UTC
        0 points
        Parent
        Ah, are you saying you should use your prior to choose a policy that maximizes your median utility, and then implementing that policy, rather than updating your prior with your observations and then choosing a policy that maximizes the median? So like UDT but with medians?
        
        It seems difficult to analyze how it would actually behave, but it seems likely to be true that it acts much more similarly to mean utility maximization than it would if you updated before choosing the policy. Both of these properties (difficulty to analyze, and similarity to mean maximization) make it difficult to identify problems that it would perform poorly on. But this also makes it difficult to defend its alleged advantages (for instance, if it ends up being too similar to mean maximization, and if you use an unbounded utility function as you seem to insist, perhaps it pays Pascal’s mugger).
        Stuart_Armstrong 17 Sep 2015 10:03 UTC
        0 points
        Parent
        
        Ah, are you saying you should use your prior to choose a policy that maximizes your median utility, and then implementing that policy, rather than updating your prior with your observations and then choosing a policy that maximizes the median? So like UDT but with medians?
        
        Ouch! Sorry for not being clear. If you missed that, then you can’t have understood much of what I was saying!