Peterdjones comments on asking an AI to make itself friendly

Peterdjones 27 Jun 2011 18:21 UTC
−2 points
You’re assuming the friendliness problem has been solved. An evil AI could see the question as a perfect opportunity to hand down a solution than could spell our doom.
- anotheruser 28 Jun 2011 9:41 UTC
  −3 points
  Parent
  Why would the AI be evil?
  
  Intentions don’t develop on their own. “Evil” intentions could only arise from misinterpreting existing goals.
  
  While you are asking it to come up with a solution, you have its goal set to what I said in the original post:
  
  “the temporary goal to always answer questions thruthfully as far as possible while admitting uncertainty”
  
  Where would the evil intentions come from? At the moment you are asking the question, the only thing on the AI’s mind is how it can answer truthfully.
  
  The only loophole I can see is that it might realize it can reduce its own workload by killing everyone who is asking it questions, but that would be countered by the secondary goal “don’t influence reality beyond answering questions”.
  
  Unless the programmers are unable to give the AI this extremely simple goal to just always speak the truth (as far as it knows), the AI won’t have any hidden intentions.
  
  And if the programmers working on the AI really are unable to implement this relatively simple goal, there is no hope that they would ever be able to implement the much more complex “optimal goal” they are trying to find out, anyway.
  - Peterdjones 28 Jun 2011 17:54 UTC
    −1 points
    Parent
    
    Why would the AI be evil?
    
    Bugs, maybe
    
    Intentions don’t develop on their own. “Evil” intentions could only arise from misinterpreting existing goals.
    
    While you are asking it to come up with a solution, you have its goal set to what I said in the original post:
    
    Have you? Are you talking about a human level AI. Asking or commanding a human to do something doesn’t set that as their one an onyl goal. A human reacts according to their existing goals:they might complyhl, refuse or subvert the command.
    
    “the temporary goal to always answer questions thruthfully as far as possible while admitting uncertainty”
    
    Why would it be easier to code in “be truthful” than “be friendly”?
    - anotheruser 29 Jun 2011 6:53 UTC
      −1 points
      Parent
      that would have to be a really sophisticated bug to misinterpret “always answer questions thruthfully as far as possible while admitting uncertainty” as “kill all humans”. I’d imagine that something as drastic as that could be found and corrected long before that. Consider that you have its goal set to this. It knows no other motivation but to respond thruthfully. It doesn’t care about the survival of humanity, or itself or about how reality really is. All it cares for is to answer the questions to the best of its abilities.
      
      I don’t think that this goal would be all too hard to define either, as “the truth” is a pretty simple concept. As long it deals with uncertainty in the right way (by admitting it), how could this be misinterpreted? Friendliness is far harder to define because we don’t even know a definition for it ourselves. There are far too many things to consider when defining “friendliness”.
      - Larks 29 Jun 2011 15:24 UTC
        4 points
        Parent
        Trivial Failure Case: The AI turns the universe into hardware to support really big computations, so it can be really sure it’s got the right answer, and also callibrate itself really well on the uncertainty.
      - Peterdjones 29 Jun 2011 12:50 UTC
        −6 points
        Parent
        
        I don’t think that this goal would be all too hard to define either, as “the truth” is a pretty simple concept.
        
        Legions of philosophers would disagree with you
        
        that would have to be a really sophisticated bug to misinterpret “always answer questions thruthfully as far as possible while admitting uncertainty” as “kill all humans”.
        
        Maybe “Humans should die” is the truth. Maybe humans are bad for the planet. One of the problems with FAI is that you don’t want to give it objective morality because of that risk. You want it to side with humans. Hence “friendly” AI rather than “righteous AI”.
        anotheruser 29 Jun 2011 14:26 UTC
        1 point
        Parent
        
        Legions of philosophers would disagree with you
        
        They just bicker endlessly about uncertainty. “can you really know that 1+1=2?”. No, but it can be used as valid until proven otherwise (which will never happen). As I said, the AI would need to understand the idea of uncertainty.
        
        Maybe “Humans should die” is the truth. Maybe humans are bad for the planet. One of the problems >with FAI is that you don’t want to give it objective morality because of that risk. You want it to side with >humans. Hence “friendly” AI rather than “righteous AI”.
        
        there is no such thing as objective morality. Good and evil are subjective ideas, nothing more. Firstly, unless someone explicitly tells the AI that it is a fundamental truth that nature is important to preserve, this can not happen. Secondly, the AI would also have to be incredibly gullible to just swallow such a claim. Thirdly, even if the AI does believe that, it will plainly say so to the people it is conversing with, in accordance with its goal to always tell the truth, thus warning us of this bug.
        Larks 29 Jun 2011 15:27 UTC
        0 points
        Parent
        
        They just bicker endlessly about uncertainty. “can you really know that 1+1=2?”.
        
        I agree with you that I don’t think a AGI would have the same problems humans have with the concept of truth. However, what you described is neither the issues philosophers raise nor the sorts of big-universe issues the AI might get stuck on.
        anotheruser 29 Jun 2011 16:02 UTC
        −3 points
        Parent
        But wouldn’t that actually support my approach? Assuming that there really is something important that all of humanity misses but the AI understands:
        
        -If you hardcode the AI’s optimal goal based on human deliberations you are guaranteed to miss this important thing.
        
        -If you use the method I suggested, the AI will, driven by the desire to speak the truth, try to explain the problem to the humans who will in turn tell the AI what they think of that.
        Larks 29 Jun 2011 20:24 UTC
        0 points
        Parent
        I don’t see how that’s relivant to philosophical questions about truth. Did you mean to reply to my other comment?
        Peterdjones 29 Jun 2011 15:41 UTC
        −6 points
        Parent
        
        [Philosophers] just bicker endlessly about uncertainty. “can you really know that 1+1=2?”.
        
        I don’t think that is a good characterisation of the debate. It isn’t just about uncertainty.
        
        there is no such thing as objective morality. Good and evil are subjective ideas, nothing more.
        
        That’s what you think. Some smart humans disagree with you. A supermsart AI might disagree with you and might be right. How can you second guess it? You cannot predict the behaviour of a supersmart AI on the basis that i t will agree with you, who are less smart.
        
        Firstly, unless someone explicitly tells the AI that it is a fundamental truth that nature is important to preserve, this can not happen.
        
        Unless it figures it out.
        
        Secondly, the AI would also have to be incredibly gullible to just swallow such a claim.
        
        Why would that require more gullibility than “species X is more important than all the others”? That doesn’t even look like a moral claim.
        
        Thirdly, even if the AI does believe that, it will plainly say so to the people it is conversing with, in accordance with its goal to always tell the truth, thus warning us of this bug.
        
        If it has “swallowed that* claim. You are assuming that the AI has a free choice about some goals and is just programmed with others.
        anotheruser 29 Jun 2011 15:52 UTC
        −3 points
        Parent
        
        If it has “swallowed* that claim. You are assuming that the AI has a free choice about some goals >and is just programmed with others.
        
        This is the important part.
        
        the “optimal goal” is not actually controlling the AI.
        
        the “optimal goal” is merely the subject of a discussion.
        
        what is controlling the AI is the desire the tell the truth to the humans it is talking to, nothing more.
        
        Why would that require more gullibility than “species X is more important than all the others”? >That doesn’t even look like a moral claim.
        
        The entire discussion is not supposed to unearth some kind of pure, inherently good, perfect optimal goal that transcends all reason and is true by virtue of existing or something.
        
        The AI is supposed to take the human POV and think “if I were these humans, what would I want the AI’s goal to be”.
        
        I didn’t mention this explicitly because I didn’t think it was necessary but the “optimal goal” is purely subjective from the POV of humanity and the AI is aware of this.
        Peterdjones 29 Jun 2011 16:10 UTC
        −2 points
        Parent
        
        some kind of pure, inherently good, perfect optimal goal that transcends all reason and is true by virtue of existing or something.
        
        But if that is true, the AI will say so. What’s more, you kind of need the AI to refrain from acting on it, if it is a human-unfriendly objective moral truth. There are ethical puzzles where it is apparently right to lie or keep schtum, because of the consequences of telling the truth.