cousin_it comments on How likely the AI that knows it’s evil? Or: is a human-level understanding of human wants enough?

cousin_it 21 May 2012 9:58 UTC
14 points
Yeah, it’s weird that Eliezer’s metaethics and FAI seem to rely on figuring out “true meanings” of certain words, when Eliezer also wrote a whole sequence explaining that words don’t have “true meanings”.

For example, Eliezer’s metaethical approach (if it worked) could be used to actually answer questions like “if a tree falls in the forest and no one’s there, does it make a sound?”, not just declare them meaningless :-) Namely, it would say that “sound” is not a confused jumble of “vibrations of air” and “auditory experiences”, but a coherent concept that you can extrapolate by examining lots of human brains. Funny I didn’t notice this tension until now.
- Wei Dai 21 May 2012 20:24 UTC
  7 points
  Parent
  I’ve argued before that CEV is just a generic method for solving confusing problems (simulate a bunch of smart and self-improving people and ask them what the answers are), and the concept (as opposed to the actual running of it) offers no specific insights into the nature of morality.
  
  In the case of “if a tree falls in the forest and no one’s there, does it make a sound?”, “extrapolating” would work pretty well, I think. The extrapolation could start with someone totally confused about what sound is (e.g., “it’s something that God created to let me hear things”), and then move on a confused jumble of “vibrations of air” and “auditory experiences”, and then to the understanding that by “sound” people sometimes mean “vibrations” and sometimes “experiences” and sometimes are just confused.
  
  ETA: I agree with Chris it’s not clear what the connection between your comment and the post is. Can you explain?
  - cousin_it 21 May 2012 21:14 UTC
    2 points
    Parent
    I admit the connection is pretty vague. Chris mentioned “skill at understanding humans”, that made me recall Eliezer’s sequence on words, and something just clicked I guess. Sorry for derailing the discussion.
- Vladimir_Nesov 21 May 2012 10:25 UTC
  7 points
  Parent
  I’m guessing the decision making role is a more accurate reference to human goals than the usage of words in describing them.
  - cousin_it 21 May 2012 10:42 UTC
    5 points
    Parent
    Are you proposing to build FAI based only on people’s revealed preferences? I’m not saying that’s a bad idea, but note that most of our noble-sounding goals disagree with our revealed preferences.
    - Vladimir_Nesov 21 May 2012 10:48 UTC
      6 points
      Parent
      Approval or disapproval of certain behaviors or certain algorithms for extrapolation of preference can also be a kind of decision. And not all behavior follows to any significant extent from decision making, in the sense of following a consequentialist loop (from dependence of utility on action, to action). Finding goals in their decision making role requires considering instances of decision making, not just of behavior.
      - cousin_it 21 May 2012 11:59 UTC
        3 points
        Parent
        You could certainly do that, but the problem still stands, I think.
        
        The goal of extrapolating preferences is to answer questions like “is outcome X better or worse than outcome Y?” Your FAI might use revealed preferences of humans over extrapolation algorithms, or all sorts of other clever ideas. We want to always obtain a definite answer, with no option of saying “sorry, your question is confused”.
        
        But such powerful methods could also be used to obtain yes/no answers to questions about trees falling in the forest, with no option of saying “sorry, your question is confused”. In this case the answers are clearly garbage. What makes you convinced that asking the algorithm about human preferences won’t result in garbage as well?
        Vladimir_Nesov 23 May 2012 21:41 UTC
        0 points
        Parent
        
        The goal of extrapolating preferences is to answer questions like “is outcome X better or worse than outcome Y?” … We want to always obtain a definite answer, with no option of saying “sorry, your question is confused”.
        
        I distinguish the stage where a formal goal definition is formulated. So elicitation/extrapolation of preferences is part of the goal definition, while judgments* are made according to a decision algorithm that uses that goal definition.
        
        Your FAI might use revealed preferences of humans over extrapolation algorithms, or all sorts of other clever ideas.
        
        This was meant as an example to break the connotations of “revealed preferences” as summary of tendencies in real-world behavior. The idea I was describing was to take all sorts of simple hypothetical events associated with humans, including their reflection on various abstract problems (which is not particularly “real world” in the way the phrase “revealed preferences” suggests), and to find a formal goal definition that in some sense holds the most explanatory power in explaining these events in terms of abstract consequentialist decisions about these events (with that goal).
        
        But such powerful methods could also be used to obtain yes/no answers to questions about trees falling in the forest
        
        I don’t think so. I’m talking about taking events, such as pressing certain buttons on keyboard, and trying to explain them as consequentialist decisions (“Which goal does pressing the buttons this way optimize?”). This won’t work with just a few actions, so I don’t see how to apply it to individual utterances about trees, and what use would a goal fitted to that behavior would be in resolving the meaning of words.
        
        [*] Or rather decisions: I’m not sure the notion of “outcome” or even “state of the world” can be fixed in this context. By analogy, output of a program is an abstract property of its source code, and this output (property of the source code) can sometimes be controlled without controlling the source code itself. If we fix a notion of the state of the world, maybe some of the world’s important abstract properties can be controlled without controlling its state. If that is the case, it’s wrong to define a utility function over possible states of the world, since it’d miss the distinctions between different hypothetical abstract properties of the same state of the world.
    - RomeoStevens 22 May 2012 0:32 UTC
      4 points
      Parent
      a near FAI (revealed preference): everyone loudly complains about conditions while enjoying themselves immensely. a far FAI (stated preference): everyone loudly proclaims our great success while being miserable.
  - ChrisHallquist 21 May 2012 11:55 UTC
    2 points
    Parent
    Yeah. Just because there is no “true meaning” of the word “want” doesn’t mean there won’t be difficult questions about what we really want, once we fix a definition of “want.”
- ChrisHallquist 21 May 2012 11:49 UTC
  6 points
  Parent
  (1) This was not the point of my post. (2) In fact I see no reason to think what you say is true. (3) Now I’m double-questioning whether my initial post was clearly written enough.
- gRR 21 May 2012 10:27 UTC
  0 points
  Parent
  Does is rely on true meanings of words, particularly? Why not on concepts? Individually, “vibrations of air” and “auditory experiences” can be coherent.
  - cousin_it 21 May 2012 11:39 UTC
    2 points
    Parent
    What’s the general algorithm you can use to determine if something like “sound” is a “word” or a “concept”?
    - gRR 21 May 2012 12:29 UTC
      0 points
      Parent
      If it extrapolates coherently, then it’s a single concept, otherwise it’s a mixture :)
      
      This may actually be doable, even at present level of technology. You gather a huge text corpus, find the contexts where the word “sound” appears, do the clustering using some word co-occurence metric. The result is a list of different meanings of “sound”, and a mapping from each mention to the specific meaning. You can also do this simultaneously for many words together, then it is a global optimization problem.
      
      Of course, AGI would be able to do this at a deeper level than this trivial syntactic one.