Stuart_Armstrong comments on Median utility rather than mean?

Stuart_Armstrong 14 Sep 2015 11:28 UTC
0 points

Define me a process with all those properties except the last one.

Well, there’s my old idea here: http://lesswrong.com/lw/8qb/cevinspired_models/ . I don’t think it’s particularly good, but it does construct a utility function, and might be doable with good enough models or a WBE. More broadly, there’s the general “figure out human preferences from their decisions and from hypothetical questions and fit a utility function to it”, which we can already do today (see “inverse reinforcement learning”); we just can’t do it well enough, yet, to get something generally safe at the other end.

None of these ideas have independent variants (not technically true; I can think of some independent versions of them, but they’re so ludicrously unsafe in our world that we’d rule them out immediately; thus, this would be a non-independent process).

If you are neutral between .4A+.6C and .4B+.6C, then you don’t have a very good claim to preferring A over B.

?

If I actually do prefer A over B (and my behaviour reflects that in (1- ɛ)A+ ɛC versus (1-ɛ)B+ ɛC cases), then I have an extremely good claim to preferring A over B, and an extremely poor claim to independence.
- AlexMennen 14 Sep 2015 18:07 UTC
  0 points
  Parent
  
  Well, there’s my old idea here: http://lesswrong.com/lw/8qb/cevinspired_models/ . I don’t think it’s particularly good
  
  I assumed accuracy was implied by “making a mess of preferences into a utility function”.
  
  More broadly, there’s the general “figure out human preferences from their decisions and from hypothetical questions and fit a utility function to it”, which we can already do today (see “inverse reinforcement learning”); we just can’t do it well enough, yet, to get something generally safe at the other end.
  
  I’m somewhat skeptical of that strategy for learning utility functions, because the space of possible outcomes is extremely high-dimensional, and it may be difficult to test extreme outcomes because the humans you’re trying to construct a utility function for might not be able to understand them. But perhaps this objection doesn’t get to the heart of the matter, and I should put it aside for now.
  
  None of these ideas have independent variants
  
  I am admittedly not well-versed in inverse reinforcement learning, but this is a perplexing claim. Except for a few people like you suggesting alternatives, I’ve only ever heard “utility function” used to refer to a function you maximize the expected value of (if you’re trying to handle uncertainty), or a function you just maximize the value of (if you’re not trying to handle uncertainty). In the first case, we have independence. In the second case, the question of whether or not we obey independence doesn’t really make sense. So if inverse reinforcement learning violates independence, then what exactly does it try to fit to human preferences?
  
  If I actually do prefer A over B
  
  Then if the only difference between two gambles is that one might give you A when the other might give you B, you’ll take the one that might give you something you like instead of something you don’t like.
  - Stuart_Armstrong 15 Sep 2015 11:01 UTC
    0 points
    Parent
    
    I’ve only ever heard “utility function” used to refer to
    
    To be clear, I am saying the process of constructing the utility function violates independence, not that subsequently maximising it does. Similarly, choosing a median-maximising policy P violates independence, but there is (almost certainly) a utility u such that maximising u is the same as following P.
    
    Once the first choice is made, we have independence in both cases; before it is made, we have it in neither. The philosophical underpinning of independence in single decisions therefore seems very weak.
    - AlexMennen 15 Sep 2015 17:08 UTC
      0 points
      Parent
      
      To be clear, I am saying the process of constructing the utility function violates independence
      
      Feel free to tell me to shut up and learn how inverse reinforcement learning works before bothering you with such questions, if that is appropriate, but I’m not sure what you mean. Can you be more precise about what property you’re saying inverse reinforcement learning doesn’t have?
      - Stuart_Armstrong 16 Sep 2015 11:10 UTC
        0 points
        Parent
        Inverse reinforcement learning relies on observation of humans performing specific actions, and drawing the “right” conclusion as to what their preferences. Indirectly, it relies on humans tinkering with its code to remove “errors”, ie things that don’t fit with the mental image that human programmers of what preferences should be.
        
        Given that human desires are not independent (citation not needed), this process, if it produces a utility function, involves constructing something independent from non-independent input. However, to establish this utility function, the algorithm has access only to the particular problems given to it, and the particular mental images of its programmers. It is almost certain that the end result would be somewhat different if it was trained on different problems, or if its programmers had different intuitions. Therefore the process itself cannot be independent.
        AlexMennen 16 Sep 2015 22:08 UTC
        0 points
        Parent
        Ah, I see what you mean, and you’re right; the utility function constructed will depend on how the data points are sampled. This isn’t quite the same as saying that the result will depend on what results are actually available, though, unless knowledge about what results will be available is used to determine how to sample the data. This still seems like somewhat of a defect of inverse reinforcement learning, unless there ends up being a good case that some particular way of sampling the data is optimal for revealing underlying preferences and ignoring biases, or something like that.
        
        Given that human desires are not independent (citation not needed)
        
        That’s probably true, but on the other hand, you seem to want to pin the deviations of human behavior from VNM rationality on violations of the independence axiom, and it isn’t clear to me that this is the case (I don’t think the point you were making relies on this, so if you weren’t trying to make that claim then you can ignore this; it just seemed like you might be). There are situations where there are large framing effects (that is, whether A or B is preferred depends on how the options are presented, even if no other outcome C is being mixed in with them), and likely also violations of transitivity (where someone would say A>B, B>C, and C>A whenever you ask them about 2 of them without bringing up the third). It seems likely to me that most paradoxes of human decision-making have more to do with these than they do to violations of independence.