cousin_it comments on How likely the AI that knows it’s evil? Or: is a human-level understanding of human wants enough?

cousin_it 21 May 2012 10:42 UTC
5 points
Are you proposing to build FAI based only on people’s revealed preferences? I’m not saying that’s a bad idea, but note that most of our noble-sounding goals disagree with our revealed preferences.
- Vladimir_Nesov 21 May 2012 10:48 UTC
  6 points
  Parent
  Approval or disapproval of certain behaviors or certain algorithms for extrapolation of preference can also be a kind of decision. And not all behavior follows to any significant extent from decision making, in the sense of following a consequentialist loop (from dependence of utility on action, to action). Finding goals in their decision making role requires considering instances of decision making, not just of behavior.
  - cousin_it 21 May 2012 11:59 UTC
    3 points
    Parent
    You could certainly do that, but the problem still stands, I think.
    
    The goal of extrapolating preferences is to answer questions like “is outcome X better or worse than outcome Y?” Your FAI might use revealed preferences of humans over extrapolation algorithms, or all sorts of other clever ideas. We want to always obtain a definite answer, with no option of saying “sorry, your question is confused”.
    
    But such powerful methods could also be used to obtain yes/no answers to questions about trees falling in the forest, with no option of saying “sorry, your question is confused”. In this case the answers are clearly garbage. What makes you convinced that asking the algorithm about human preferences won’t result in garbage as well?
    - Vladimir_Nesov 23 May 2012 21:41 UTC
      0 points
      Parent
      
      The goal of extrapolating preferences is to answer questions like “is outcome X better or worse than outcome Y?” … We want to always obtain a definite answer, with no option of saying “sorry, your question is confused”.
      
      I distinguish the stage where a formal goal definition is formulated. So elicitation/extrapolation of preferences is part of the goal definition, while judgments* are made according to a decision algorithm that uses that goal definition.
      
      Your FAI might use revealed preferences of humans over extrapolation algorithms, or all sorts of other clever ideas.
      
      This was meant as an example to break the connotations of “revealed preferences” as summary of tendencies in real-world behavior. The idea I was describing was to take all sorts of simple hypothetical events associated with humans, including their reflection on various abstract problems (which is not particularly “real world” in the way the phrase “revealed preferences” suggests), and to find a formal goal definition that in some sense holds the most explanatory power in explaining these events in terms of abstract consequentialist decisions about these events (with that goal).
      
      But such powerful methods could also be used to obtain yes/no answers to questions about trees falling in the forest
      
      I don’t think so. I’m talking about taking events, such as pressing certain buttons on keyboard, and trying to explain them as consequentialist decisions (“Which goal does pressing the buttons this way optimize?”). This won’t work with just a few actions, so I don’t see how to apply it to individual utterances about trees, and what use would a goal fitted to that behavior would be in resolving the meaning of words.
      
      [*] Or rather decisions: I’m not sure the notion of “outcome” or even “state of the world” can be fixed in this context. By analogy, output of a program is an abstract property of its source code, and this output (property of the source code) can sometimes be controlled without controlling the source code itself. If we fix a notion of the state of the world, maybe some of the world’s important abstract properties can be controlled without controlling its state. If that is the case, it’s wrong to define a utility function over possible states of the world, since it’d miss the distinctions between different hypothetical abstract properties of the same state of the world.
- RomeoStevens 22 May 2012 0:32 UTC
  4 points
  Parent
  a near FAI (revealed preference): everyone loudly complains about conditions while enjoying themselves immensely. a far FAI (stated preference): everyone loudly proclaims our great success while being miserable.