Yes it certainly can—if that new AI helps it’s creator.
The same issue applies to children—they don’t necessarily have the same ‘utility function’, sometimes they even literally kill us, but usually they help us.
That would be not so much a benevolence explosion as a single AI creating “slave” AIs for its own purposes. If some of the child AI’s goals (for example those involved in being more good) are opposed to the parent’s goals (for example those which make the parent AI less good), the parent is not going to just let the child achieve its goals. Rational agents do not let their utility functions change.
Sure it is—this part at least is easy. For example an AGI that is fully altruistic and only experiences love as it’s single emotion would be clearly “somewhat better than us” from our perspective in every sense that matters.
If you mean that the AI doesn’t suffer from the akrasia and selfishness and emotional discounting and uncertainty about our own utility function which prevents us from acting out our moral beliefs then I agree with you. That’s the AI being more rational than us, and therefore better optimising for its utility function. But a literally better utility function is impossible, given that “better” is defined by our utility function.
Moreover, if our utility function describes what we truly want (which is the whole point of a utility function), it follows that we truly want an AI that optimizes for our utility function. If “better” were a different utility function then it would be unclear why we are trying to create an AI that does that, rather than what we want.
The real problem is the idea that morality can or should be simplified down to a ‘utility function’ simple enough for a human to code.
That’s why the plan is for the AI to figure it out by inspecting us. Morality is very much not simple to code.
The same issue applies to children—they don’t necessarily have the same ‘utility function’, sometimes they even literally kill us, but usually they help us.
That would be not so much a benevolence explosion as a single AI creating “slave” AIs for its own purposes
So do we create children as our ‘slaves’ for our own purposes? You seem to be categorically ruling out the entire possibility of humans creating human-like AIs that have a parent-child relationship with their creators.
So just to make it precisely clear, I’m talking about that type of AI specifically. The importance and feasibility of that type of AGI vs other types is a separate discussion.
Sure it is—this part at least is easy. For example an AGI that is fully altruistic and only experiences love as it’s single emotion would be clearly “somewhat better than us” from our perspective in every sense that matters.
If you mean that the AI doesn’t [ .. ]
That’s the AI being more rational than us, and therefore better optimising for its utility function.
I don’t see it as having anything to do with rationality.
The altruistic human-ish AGI mentioned above would be better than current humans from our current perspective—more like what we wish ourselves to be, and more able to improve our world than current humans.
Moreover, if our utility function describes what we truly want (which is the whole point of a utility function), it follows that we truly want an AI that optimizes for our utility function.
Yes.
This is obvious if it’s ‘utility function’ is just a projection of my own—ie it simulates what I would want and uses that as it’s utility function, but that isn’t even necessary—it’s utility function could be somewhat more complex than just a simulated projection of my own and still help fulfill my utility function.
That’s why the plan is for the AI to figure it out by inspecting us. Morality is very much not simple to code.
If by inspection you just mean teach the AI morality in human language, then I agree, but that’s a side point.
So: I want to finish my novel, but I spend the day noodling around the Internet instead.
Then Omega hands me an AI which it assures me is programmed error-free to analyze me and calculate my utility function and optimize my environment in terms of it.
I run the AI, and it determines exactly which parts of my mind manifest a desire to finish the novel, which parts manifest a desire to respond to the Internet, and which parts manifest a desire to have the novel be finished. Call them M1, M2 and M3. (They are of course overlapping sets.) Then it determines somehow which of these things are part of my utility function, and which aren’t, and to what degree.
So...
Case 1: The AI concludes that M1 is part of my utility function and M2 and M3 are not. Since it is designed to maximize my utility, it constructs an environment in which M1 triumphs. For example, perhaps it installs a highly sophisticated filter that blocks out 90% of the Internet. Result: I get lots more high-quality work done on the novel. I miss the Internet, but the AI doesn’t care, because that’s the result of M2 and M2 isn’t part of my utility function.
Case 2: The AI concludes that M3 and M2 are part of my utility function and M1 is not, so it finishes the novel itself and modifies the Internet to be even more compelling. I miss having the novel to work on, but again the AI doesn’t care.
Case 3: The AI concludes that all three things are part of my utility function. It finishes the novel but doesn’t tell me about it, thereby satisfying M3 (though I don’t know it). It makes a few minor tweaks to my perceived environment, but mostly leaves them alone, since it is already pretty well balanced between M1 and M2 (which is not surprising, since I was responding to those mental structures when I constructed my current situation).
If I’m understanding you correctly, you’re saying that I can’t really know which of these results (or of countless other possibilities) will happen, but that whichever one it is, I should have high confidence that all other possibilities would by my own standards have been worse… after all, that’s what it means to maximize my utility function.
Yes?
It seems to follow that if the AI has an added feature whereby I can ask it to describe what it’s about to do before it does it and then veto doing it, I ought not invoke that feature. (After all, I can’t make the result better, but I might make the result worse.)
Assuming you trust Omega to mean the same thing as you do when talking about your preferences and utility function, then yes. If the AI looks over your mind and optimizes the environment for your actual utility function (which could well be a combination of M1, M2 and M3), then any veto you do must make the result worse than the optimal one.
Of course, if there’s doubt about the programming of the AI, use of the veto feature would probably be wise, just in case it’s not a good genie.
You seem to be imagining a relatively weak AI. For instance, given the vast space of possibilities, there are doubtlessly environmental tweaks that would result in more fun on the internet and more high-quality work on the novel. (This is to say nothing of more invasive interventions.)
The answer to your questions is yes: assuming the AI does what Omega says it does, you won’t want to use your veto.
Not necessarily weak overall, merely that it devotes relatively few resources to addressing this particular tiny subset of my preference-space. After all, there are many other things I care about more.
But, sure, a sufficiently powerful optimizer will come up with solutions so much better that it will never even occur to me to doubt that all other possibilities would be worse. And given a sufficiently powerful optimizer, I might as well invoke the preview feature if I feel like it, because I’ll find the resulting preview so emotionally compelling that I won’t want to use my veto.
That case obscures rather than illustrates the question I’m asking, so I didn’t highlight it.
Case 4: The AI makes tweaks to your current environment in order to construct it in accordance with your mental structures, but in a way more efficient than you could have in the first place.
Sure. In which case I still noodle around on the Internet a bunch rather than work on my novel, but at least I can reassure myself that this optimally reflects my real preferences, and any belief I might have that I would actually rather get more work done on my novel than I do is simply an illusion.
That would be not so much a benevolence explosion as a single AI creating “slave” AIs for its own purposes. If some of the child AI’s goals (for example those involved in being more good) are opposed to the parent’s goals (for example those which make the parent AI less good), the parent is not going to just let the child achieve its goals. Rational agents do not let their utility functions change.
If you mean that the AI doesn’t suffer from the akrasia and selfishness and emotional discounting and uncertainty about our own utility function which prevents us from acting out our moral beliefs then I agree with you. That’s the AI being more rational than us, and therefore better optimising for its utility function. But a literally better utility function is impossible, given that “better” is defined by our utility function.
Moreover, if our utility function describes what we truly want (which is the whole point of a utility function), it follows that we truly want an AI that optimizes for our utility function. If “better” were a different utility function then it would be unclear why we are trying to create an AI that does that, rather than what we want.
That’s why the plan is for the AI to figure it out by inspecting us. Morality is very much not simple to code.
So do we create children as our ‘slaves’ for our own purposes? You seem to be categorically ruling out the entire possibility of humans creating human-like AIs that have a parent-child relationship with their creators.
So just to make it precisely clear, I’m talking about that type of AI specifically. The importance and feasibility of that type of AGI vs other types is a separate discussion.
I don’t see it as having anything to do with rationality.
The altruistic human-ish AGI mentioned above would be better than current humans from our current perspective—more like what we wish ourselves to be, and more able to improve our world than current humans.
Yes.
This is obvious if it’s ‘utility function’ is just a projection of my own—ie it simulates what I would want and uses that as it’s utility function, but that isn’t even necessary—it’s utility function could be somewhat more complex than just a simulated projection of my own and still help fulfill my utility function.
If by inspection you just mean teach the AI morality in human language, then I agree, but that’s a side point.
So: I want to finish my novel, but I spend the day noodling around the Internet instead.
Then Omega hands me an AI which it assures me is programmed error-free to analyze me and calculate my utility function and optimize my environment in terms of it.
I run the AI, and it determines exactly which parts of my mind manifest a desire to finish the novel, which parts manifest a desire to respond to the Internet, and which parts manifest a desire to have the novel be finished. Call them M1, M2 and M3. (They are of course overlapping sets.) Then it determines somehow which of these things are part of my utility function, and which aren’t, and to what degree.
So...
Case 1: The AI concludes that M1 is part of my utility function and M2 and M3 are not. Since it is designed to maximize my utility, it constructs an environment in which M1 triumphs. For example, perhaps it installs a highly sophisticated filter that blocks out 90% of the Internet. Result: I get lots more high-quality work done on the novel. I miss the Internet, but the AI doesn’t care, because that’s the result of M2 and M2 isn’t part of my utility function.
Case 2: The AI concludes that M3 and M2 are part of my utility function and M1 is not, so it finishes the novel itself and modifies the Internet to be even more compelling. I miss having the novel to work on, but again the AI doesn’t care.
Case 3: The AI concludes that all three things are part of my utility function. It finishes the novel but doesn’t tell me about it, thereby satisfying M3 (though I don’t know it). It makes a few minor tweaks to my perceived environment, but mostly leaves them alone, since it is already pretty well balanced between M1 and M2 (which is not surprising, since I was responding to those mental structures when I constructed my current situation).
If I’m understanding you correctly, you’re saying that I can’t really know which of these results (or of countless other possibilities) will happen, but that whichever one it is, I should have high confidence that all other possibilities would by my own standards have been worse… after all, that’s what it means to maximize my utility function.
Yes?
It seems to follow that if the AI has an added feature whereby I can ask it to describe what it’s about to do before it does it and then veto doing it, I ought not invoke that feature. (After all, I can’t make the result better, but I might make the result worse.)
Yes?
Assuming you trust Omega to mean the same thing as you do when talking about your preferences and utility function, then yes. If the AI looks over your mind and optimizes the environment for your actual utility function (which could well be a combination of M1, M2 and M3), then any veto you do must make the result worse than the optimal one.
Of course, if there’s doubt about the programming of the AI, use of the veto feature would probably be wise, just in case it’s not a good genie.
You seem to be imagining a relatively weak AI. For instance, given the vast space of possibilities, there are doubtlessly environmental tweaks that would result in more fun on the internet and more high-quality work on the novel. (This is to say nothing of more invasive interventions.)
The answer to your questions is yes: assuming the AI does what Omega says it does, you won’t want to use your veto.
Not necessarily weak overall, merely that it devotes relatively few resources to addressing this particular tiny subset of my preference-space. After all, there are many other things I care about more.
But, sure, a sufficiently powerful optimizer will come up with solutions so much better that it will never even occur to me to doubt that all other possibilities would be worse. And given a sufficiently powerful optimizer, I might as well invoke the preview feature if I feel like it, because I’ll find the resulting preview so emotionally compelling that I won’t want to use my veto.
That case obscures rather than illustrates the question I’m asking, so I didn’t highlight it.
Case 4: The AI makes tweaks to your current environment in order to construct it in accordance with your mental structures, but in a way more efficient than you could have in the first place.
Sure. In which case I still noodle around on the Internet a bunch rather than work on my novel, but at least I can reassure myself that this optimally reflects my real preferences, and any belief I might have that I would actually rather get more work done on my novel than I do is simply an illusion.
If those are, in fact, your real preferences, then sure.