Florian_Dietz comments on LessWrong’s attitude towards AI research

Florian_Dietz 22 Sep 2014 21:51 UTC
4 points
I wouldn’t call an AI like that friendly at all. It just puts people in utopias for external reasons, but it has no actual inherent goal to make people happy. None of these kinds of AIs are friendly, some are merely less dangerous than others.
- [deleted] 24 Sep 2014 20:28 UTC
  1 point
  Parent
  I’m now curious how surface friendly an AI can appear to be without giving it an inherent goal to make people happy. Because I agree that it does seem there are friendlier AI’s than the ones on the list above that still don’t care about people’s happiness.
  
  Let’s take an AI that likes increasing the number of unique people that have voluntarily given it cookies. If any person voluntarily gives it a cookie, it will put that person in a verifiability protected simulated utopia forever. Because that is the best bribe that it can think to offer, and it really wants to be given cookies by unique people, so it bribes them.
  
  If a person wants to give the AI a cookie, but can’t, the AI will give them a cookie from it’s stockpile just so that it can be given a cookie back. (It doesn’t care about it’s existing stockpile of cookies.)
  
  You can’t accidentally give the AI a cookie because the AI makes very sure that you REALLY ARE giving it a cookie to avoid uncertainty in doubting it’s own utility accumulation.
  
  This is slightly different than the first series of AIs in that while the AI doesn’t care about your happiness, it does need everyone to do something for it, whereas the first AIs would be perfectly happy to turn you into paperclips regardless of your opinions if one particular person had helped them enough earlier.
  
  Although, I have a feeling that continuing along this like of thinking may lead me to an AI similar to the one already described in http://tvtropes.org/pmwiki/pmwiki.php/Fanfic/FriendshipIsOptimal
  - Florian_Dietz 25 Sep 2014 6:25 UTC
    0 points
    Parent
    The AI in that story actually seems to be surprisingly well done and does have an inherent goal to help humanity. It’s primary goal is to ‘satisfy human values through friendship and ponies’. That’s almost perfect, since here ‘satisfying human values’ seems to be based on humanity’s CEV.
    
    It’s just that the added ‘through friendship and ponies’ turns it from a nigh-perfect friendly AI into something really weird.
    
    I agree with your overall point, though.