Squark comments on Open thread, 21-27 April 2014

Squark 25 Apr 2014 21:17 UTC
2 points

It’s not clear at all that AGI will have a utility function.

That’s something I’m willing to take bets on. Regardless, it is precisely the type of question we better start studying right now. It is a question with high FAI-relevance which is likely to be important for AGI regardless of friendliness.

But furthermore, bolting a complex, friendly utility function onto whatever AI architecture we come up with will probably be a very difficult feat of engineering...

I doubt it. IMO AGI will be able to optimize any utility function, that’s what makes it an aGi. However, even if you’re right, we still need to start working on finding that utility function.
- Punoxysm 26 Apr 2014 1:33 UTC
  3 points
  Parent
  I question both of these premises. It could be like you or I, in the sense that it simply executes a sequence of actions with no coherent or constant driving utility function (even long-term goals are often inconsistent with each other), and even if you could demonstrate to it a utility function that met some extremely high standards, it would not be persuaded to adopt it. Attempting to build in such a utility function could be possible, but not necessarily natural at all; in fact I bet it would be unnatural and difficult.
  
  I understand your rebuttal to “friendliness research is too premature to be useful” is “It is important enough to risk being premature”, but I hope you can agree that stronger arguments would put forward stronger evidence that the risk is not particularly large.
  
  But let’s leave that aside. I’ll concede that it is possible that developing a strong friendliness theory before strong AI could be the only path to safe AI under some circumstances.
  
  I still think that it is mistaken to try to ignore intermediate scenarios and focus only on that case. I wrote about this in a post before, How to Study AGIs safely which you commented on.
  - Squark 28 Apr 2014 19:04 UTC
    1 point
    Parent
    
    It could be like you or I, in the sense that it simply executes a sequence of actions with no coherent or constant driving utility function...
    
    I doubt the first AGI will be like this, unless you count WBE as AGI. But if it will, it’s very bad news, since it would be very difficult to make it friendly. Such an AGI is akin to an alien species which evolved under conditions vastly different from ours: it will probably have very different values.