Eliezer Yudkowsky comments on Fake Utility Functions

Eliezer Yudkowsky 6 Dec 2007 17:40 UTC
10 points
But it’s not a grand climax, as I remarked yesterday—just a post that happened to require a lot of prerequisites.

Verbally instructing a powerful AI as to what we want would hardly suffice to make it safe, if the AI did not already know what we wanted (Type II genie).

I’m not going to spring some grand, predefined conclusion at the end of this. All that is being gradually sprung here is the ability to understand what the Friendly AI problem is, and why it is hard. I have observed that the chief difficulty I have with Friendly AI discussions is getting people to understand the question, the requirements faced by an attempted answer.
- TheAncientGeek 19 Jun 2014 12:08 UTC
  2 points
  Parent
  
  Verbally instructing a powerful AI as to what we want would hardly suffice to make it safe, if the AI did not already know what we wanted (Type II genie).
  
  The genie only needs to have a terminal goal of interpreting instructions correctly. If it has that .TG, it will acquire the instrumental goal of checking for areas of ambiguity and misunderstanding, and the further instrumental goal of resolving them. At the point where the AI is statisted it has understood the instruction it will know as much about human morality/preferences as it needs to understand the instruction correctly. It does not need to be preloaded, with complete knowledge of morality/preferences: it will ask questions or otherwise research.
  
  The type II genie story is not very relevant to the wider UFAI issue, because the genie is posited as being none sentient, apparently meaning it does not have full natural language, and also does not have any self-reflexive capabilities. As such, It can neither realise it is in a box, nor talk it’s way out. But why shouldn’t an AI that is not linguistically gifted enough to talk it’s way out of a box, be linguistically gifted enough to understand instructions correctly?
  
  More probems with morality = preferences:-
  
  It has been stated that this post shows that all values are moral values (or that there is no difference between morality and valuation in general, or..) in contrast with the common sense view that there are clear examples of morally neutral preferences, such as prefences for differnt flavours of ice cream.I am not convinced by the explanation, since it also applies ot non-moral prefrences. If I have a lower priority non moral prefence to eat tasty food, and a higher priority preference to stay slim, I need to consider my higher priority preference when wishing for yummy ice cream.To be sure, an agent capable of acting morally will have morality among their higher priority preferences—it has to be among the higher order preferences, becuase it has to override other preferences for the agent to act morally. Therefore, when they scan their higher prioriuty prefences, they will happen to encounter their moral preferences. But that does not mean any preference is necessarily a moral preference. And their moral prefences override other preferences which are therefore non-moral, or at least less moral.There is no safe wish smaller than an entire human morality.There is no safe wish smaller than all the subset of value structure, moral or amoral, above it in priority. The subset below doesn’t matter. However, a value structure need not be moral at all, and the lower stories will probably be amoral even if the upper stories are not.Therefore morality is in general a subset of prefences, as common sense maintained all along.
  - ChristianKl 19 Jun 2014 14:14 UTC
    3 points
    Parent
    The traditional idea with genie is that they give you what you wanted but you missed the implications of what you wanted to have.
    
    It’s garbage in, garbage out.
    
    The problem isn’t vague instructions but vague goals.
    - TheAncientGeek 19 Jun 2014 17:18 UTC
      −1 points
      Parent
      Yeah...didn’t I just argue against that?* A genie with the goal of interpreting instruction perfectly and and the competence to interpret instructions corrrctly would interpret instruction s correctly.
      
      or at least stipulate in the very first sentence.
  - DefectiveAlgorithm 19 Jun 2014 14:02 UTC
    2 points
    Parent
    
    a terminal goal of interpreting instructions correctly
    
    There is a huge amount of complexity hidden beneath this simple description.
    - TheAncientGeek 19 Jun 2014 17:01 UTC
      5 points
      Parent
      I’ll say it again: absolute complexity is not relative complexity.
      
      Everything in AGI us very complex in absolute teams.
      
      In relative terms, language is less complex than language+morality
      - VAuroch 23 Aug 2014 21:33 UTC
        0 points
        Parent
        That would matter if you didn’t need language+morality to interpret language in this case. To interpret instructions correctly, you have to understand what they mean, and that requires a full understanding of the motivations underlying the request.
        
        You don’t just need language, you need language+thought, which is even more complex than language+morality.
        TheAncientGeek 24 Aug 2014 16:29 UTC
        3 points
        Parent
        I am using “having language” to mean “having language plus thought”, ie to have linguistic understanding, ie to have the ability to pass a Turning Test. Language without thought is just parotting.
        
        To follow instructions relating morality correctly, an entity must be able to understand them correctly at the semantic levlel. An entity need not agree with them, or hold to them itself, as we can see from the ability of people to play along with social rules they don’t personally agree with.
        VAuroch 24 Aug 2014 22:23 UTC
        0 points
        Parent
        No, that’s not right. language + thought is to understand language and be able to fully model the mindstate of the person who was speaking to you. If you don’t have this, and just have language, ‘get grandma out of the burning house ’ gets you the lethal ejector seat method. If you want do-what-I-mean rather than do-what-I-say, you need full thought modeling. Which is obviously harder than language + morality, which requires only being able to parse language correctly and understand a certain category of thought.
        
        Or to phrase it a different way: language on its own gets you nothing productive, just a system that can correctly parse statements. To understand what they mean, rather than what they say, you need something much broader, and language+morality is smaller than that broad thing.
        TheAncientGeek 26 Aug 2014 17:34 UTC
        0 points
        Parent
        Fully understanding the semantics of morality may be simpler than fully understanding the semantics of everything, but it doesn’t get you AI safety, because an AI can understand something without being motivated to act on it.
        
        When I wrote “language”, I meant words + understanding ….understanding in general, therefore including understanding of ethics..and when I wrote “morality” I meant a kind motivation.
        Bruno Mailly 1 Aug 2018 5:36 UTC
        2 points
        Parent
        When I wrote “language”, I meant
        When I use a word… it means just what I choose it to mean
        (Alice in Wonderland)