It’s not clear at all that AGI will have a utility function.
That’s something I’m willing to take bets on. Regardless, it is precisely the type of question we better start studying right now. It is a question with high FAI-relevance which is likely to be important for AGI regardless of friendliness.
But furthermore, bolting a complex, friendly utility function onto whatever AI architecture we come up with will probably be a very difficult feat of engineering...
I doubt it. IMO AGI will be able to optimize any utility function, that’s what makes it an aGi. However, even if you’re right, we still need to start working on finding that utility function.
I question both of these premises. It could be like you or I, in the sense that it simply executes a sequence of actions with no coherent or constant driving utility function (even long-term goals are often inconsistent with each other), and even if you could demonstrate to it a utility function that met some extremely high standards, it would not be persuaded to adopt it. Attempting to build in such a utility function could be possible, but not necessarily natural at all; in fact I bet it would be unnatural and difficult.
I understand your rebuttal to “friendliness research is too premature to be useful” is “It is important enough to risk being premature”, but I hope you can agree that stronger arguments would put forward stronger evidence that the risk is not particularly large.
But let’s leave that aside. I’ll concede that it is possible that developing a strong friendliness theory before strong AI could be the only path to safe AI under some circumstances.
I still think that it is mistaken to try to ignore intermediate scenarios and focus only on that case. I wrote about this in a post before, How to Study AGIs safely
which you commented on.
It could be like you or I, in the sense that it simply executes a sequence of actions with no coherent or constant driving utility function...
I doubt the first AGI will be like this, unless you count WBE as AGI. But if it will, it’s very bad news, since it would be very difficult to make it friendly. Such an AGI is akin to an alien species which evolved under conditions vastly different from ours: it will probably have very different values.
That’s something I’m willing to take bets on. Regardless, it is precisely the type of question we better start studying right now. It is a question with high FAI-relevance which is likely to be important for AGI regardless of friendliness.
I doubt it. IMO AGI will be able to optimize any utility function, that’s what makes it an aGi. However, even if you’re right, we still need to start working on finding that utility function.
I question both of these premises. It could be like you or I, in the sense that it simply executes a sequence of actions with no coherent or constant driving utility function (even long-term goals are often inconsistent with each other), and even if you could demonstrate to it a utility function that met some extremely high standards, it would not be persuaded to adopt it. Attempting to build in such a utility function could be possible, but not necessarily natural at all; in fact I bet it would be unnatural and difficult.
I understand your rebuttal to “friendliness research is too premature to be useful” is “It is important enough to risk being premature”, but I hope you can agree that stronger arguments would put forward stronger evidence that the risk is not particularly large.
But let’s leave that aside. I’ll concede that it is possible that developing a strong friendliness theory before strong AI could be the only path to safe AI under some circumstances.
I still think that it is mistaken to try to ignore intermediate scenarios and focus only on that case. I wrote about this in a post before, How to Study AGIs safely which you commented on.
I doubt the first AGI will be like this, unless you count WBE as AGI. But if it will, it’s very bad news, since it would be very difficult to make it friendly. Such an AGI is akin to an alien species which evolved under conditions vastly different from ours: it will probably have very different values.