Stuart_Armstrong comments on Friendly AI ideas needed: how would you ban porn?

Stuart_Armstrong 31 Mar 2014 14:33 UTC
2 points

...humans have preference over their state of knowledge...

Why do you think it is inconsistent with having a utility function?

They have preferences like ambiguity aversion, eg being willing to pay to find out, during a holiday, whether they were accepted for a job, while knowing that they can’t make any relevant decisions with that early knowledge. This is not compatible with following a standard utility function.
- Squark 31 Mar 2014 17:41 UTC
  0 points
  Parent
  
  They have preferences like ambiguity aversion, eg being willing to pay to find out, during a holiday, whether they were accepted for a job, while knowing that they can’t make any relevant decisions with that early knowledge. This is not compatible with following a standard utility function.
  
  I don’t know what you mean by “standard” utility function. I don’t even know what you mean by “following”. We want to find out since uncertainty makes you nervous, being nervous is unpleasant and pleasure is a terminal value. It is entirely consistent with having a utility function and with my formalism in particular.
  
  Humans are not ideal rational optimizers of their respective utility functions.
  
  Then why claim that they have one? If humans have intransitive preferences (A>B>C>A), as I often do, then why claim that actually their preferences are secretly transitive but they fail to act on them properly?
  
  In what epistemology are you asking this question? That is, what is the criterion according to which the validity of answer would be determined?
  
  If you don’t think human preferences are “secretly transitive”, then why do you suggest the following:
  
  Whenever revealed preferences are non-transitive or non-independent, use the person’s stated meta-preferences to remove the issue. The AI thus calculates what the person would say if asked to resolve the transitivity or independence (for people who don’t know about the importance of resolving them, the AI would present them with a set of transitive and independent preferences, derived from their revealed preferences, and have them choose among them).
  
  What is the meaning of asking a person to resolve intransitivities if there are no transitive preferences underneath?
  - Stuart_Armstrong 31 Mar 2014 19:13 UTC
    0 points
    Parent
    
    I don’t even know what you mean by “following”.
    
    That is, what is the criterion according to which the validity of answer would be determined?
    
    Those are questions for you, not for me. You’re claiming that humans have a hidden utility function. What do you mean by that, and what evidence do you have for your position?
    - Squark 31 Mar 2014 19:52 UTC
      0 points
      Parent
      I’m claiming that it is possible to define the utility function of any agent. For unintelligent “agents” the result is probably unstable. For intelligent agents the result should be stable.
      
      The evidence is that I have a formalism which produces this definition in a way compatible with intuition about “agent having a utility function”. I cannot present evidence which doesn’t rely on intuition since that would require having another more fundamental definition of “agent having a utility function” (which AFAIK might not exist). I do not consider this to be a problem since all reasoning falls back to intuition if you ask “why” sufficiently many times.
      
      I don’t see any meaningful definition of intelligence or instrumental rationality without a utility function. If we accepts humans are (approximately) rational / intelligent, they must (in the same approximation) have utility functions.
      
      It also seems to me (again, intuitively) that the very concept of “preference” is incompatible with e.g. intransitivity. In the approximation it makes sense to speak of “preferences” at all, it makes sense to speak of preferences compatible with the VNM axioms ergo utility function. Same goes for the concept of “should”. If it makes sense to say one “should” do something (for example build a FAI), there must be a utility function according to which she should do it.
      
      Bottom line, eventually it all hits philosophical assumptions which have no further formal justification. However, this is true of all reasoning. IMO the only valid method to disprove such assumptions is either by reductio ad absurdum or by presenting a different set of assumptions which is better in some sense. If you have such an alternative set of assumption for this case or a wholly different way to resolve philosophical questions, I would be very interested to know.
      - Stuart_Armstrong 1 Apr 2014 12:21 UTC
        0 points
        Parent
        
        I’m claiming that it is possible to define the utility function of any agent.
        
        It is trivially possible to do that. Since no choice is strictly identical, you just add enough details to make each choice unique, and then choose a utility function that will always reach that choice (“subject has a strong preference for putting his left foot forwards when seeing an advertisement for deodorant on Tuesday morning that are the birthdays of prominent Dutch politicians”).
        
        A good simple model of human behaviour is that of different modules expressing preferences and short-circuiting the decision making in some circumstances, and a more rational system (“system 2”) occasionally intervening to prevent loss through money pumps. So people are transitive in their ultimate decisions, often and to some extent, but their actual decisions depend strongly on which choices are presented first (ie their low level preferences are intransitive, but the rational part of them prevents loops). Would you say these beings have no preferences?
        Squark 1 Apr 2014 13:24 UTC
        0 points
        Parent
        
        I’m claiming that it is possible to define the utility function of any agent.
        
        It is trivially possible to do that. Since no choice is strictly identical, you just add enough details to make each choice unique, and then choose a utility function that will always reach that choice
        
        My formalism doesn’t work like that since the utility function is a function over possible universes, not over possible choices. There is no trivial way to construct a utility function wrt which the given agent’s intelligence is close to maximal. However it still might be the case we need to give larger weight to simple utility functions (otherwise we’re left with selecting a maximum in an infinite set and it’s not clear why it exists). As I said, I don’t have the final formula.
        
        A good simple model of human behaviour is that of different modules expressing preferences and short-circuiting the decision making in some circumstances, and a more rational system (“system 2”) occasionally intervening to prevent loss through money pumps. So people are transitive in their ultimate decisions, often and to some extent, but their actual decisions depend strongly on which choices are presented first (ie their low level preferences are intransitive, but the rational part of them prevents loops). Would you say these beings have no preferences?
        
        I’d say they have a utility function. Image a chess AI that selects moves by one of two strategies. The first strategy (“system 1”) uses simple heuristics like “check when you can” that produce an answer quickly and save precious time. The second strategy (“system 2”) runs a minimax algorithm with a 10-move deep search tree. Are all of the agent’s decisions perfectly rational? No. Does it have a utility function? Yes: winning the game.