So: I want to finish my novel, but I spend the day noodling around the Internet instead.
Then Omega hands me an AI which it assures me is programmed error-free to analyze me and calculate my utility function and optimize my environment in terms of it.
I run the AI, and it determines exactly which parts of my mind manifest a desire to finish the novel, which parts manifest a desire to respond to the Internet, and which parts manifest a desire to have the novel be finished. Call them M1, M2 and M3. (They are of course overlapping sets.) Then it determines somehow which of these things are part of my utility function, and which aren’t, and to what degree.
So...
Case 1: The AI concludes that M1 is part of my utility function and M2 and M3 are not. Since it is designed to maximize my utility, it constructs an environment in which M1 triumphs. For example, perhaps it installs a highly sophisticated filter that blocks out 90% of the Internet. Result: I get lots more high-quality work done on the novel. I miss the Internet, but the AI doesn’t care, because that’s the result of M2 and M2 isn’t part of my utility function.
Case 2: The AI concludes that M3 and M2 are part of my utility function and M1 is not, so it finishes the novel itself and modifies the Internet to be even more compelling. I miss having the novel to work on, but again the AI doesn’t care.
Case 3: The AI concludes that all three things are part of my utility function. It finishes the novel but doesn’t tell me about it, thereby satisfying M3 (though I don’t know it). It makes a few minor tweaks to my perceived environment, but mostly leaves them alone, since it is already pretty well balanced between M1 and M2 (which is not surprising, since I was responding to those mental structures when I constructed my current situation).
If I’m understanding you correctly, you’re saying that I can’t really know which of these results (or of countless other possibilities) will happen, but that whichever one it is, I should have high confidence that all other possibilities would by my own standards have been worse… after all, that’s what it means to maximize my utility function.
Yes?
It seems to follow that if the AI has an added feature whereby I can ask it to describe what it’s about to do before it does it and then veto doing it, I ought not invoke that feature. (After all, I can’t make the result better, but I might make the result worse.)
Assuming you trust Omega to mean the same thing as you do when talking about your preferences and utility function, then yes. If the AI looks over your mind and optimizes the environment for your actual utility function (which could well be a combination of M1, M2 and M3), then any veto you do must make the result worse than the optimal one.
Of course, if there’s doubt about the programming of the AI, use of the veto feature would probably be wise, just in case it’s not a good genie.
You seem to be imagining a relatively weak AI. For instance, given the vast space of possibilities, there are doubtlessly environmental tweaks that would result in more fun on the internet and more high-quality work on the novel. (This is to say nothing of more invasive interventions.)
The answer to your questions is yes: assuming the AI does what Omega says it does, you won’t want to use your veto.
Not necessarily weak overall, merely that it devotes relatively few resources to addressing this particular tiny subset of my preference-space. After all, there are many other things I care about more.
But, sure, a sufficiently powerful optimizer will come up with solutions so much better that it will never even occur to me to doubt that all other possibilities would be worse. And given a sufficiently powerful optimizer, I might as well invoke the preview feature if I feel like it, because I’ll find the resulting preview so emotionally compelling that I won’t want to use my veto.
That case obscures rather than illustrates the question I’m asking, so I didn’t highlight it.
Case 4: The AI makes tweaks to your current environment in order to construct it in accordance with your mental structures, but in a way more efficient than you could have in the first place.
Sure. In which case I still noodle around on the Internet a bunch rather than work on my novel, but at least I can reassure myself that this optimally reflects my real preferences, and any belief I might have that I would actually rather get more work done on my novel than I do is simply an illusion.
So: I want to finish my novel, but I spend the day noodling around the Internet instead.
Then Omega hands me an AI which it assures me is programmed error-free to analyze me and calculate my utility function and optimize my environment in terms of it.
I run the AI, and it determines exactly which parts of my mind manifest a desire to finish the novel, which parts manifest a desire to respond to the Internet, and which parts manifest a desire to have the novel be finished. Call them M1, M2 and M3. (They are of course overlapping sets.) Then it determines somehow which of these things are part of my utility function, and which aren’t, and to what degree.
So...
Case 1: The AI concludes that M1 is part of my utility function and M2 and M3 are not. Since it is designed to maximize my utility, it constructs an environment in which M1 triumphs. For example, perhaps it installs a highly sophisticated filter that blocks out 90% of the Internet. Result: I get lots more high-quality work done on the novel. I miss the Internet, but the AI doesn’t care, because that’s the result of M2 and M2 isn’t part of my utility function.
Case 2: The AI concludes that M3 and M2 are part of my utility function and M1 is not, so it finishes the novel itself and modifies the Internet to be even more compelling. I miss having the novel to work on, but again the AI doesn’t care.
Case 3: The AI concludes that all three things are part of my utility function. It finishes the novel but doesn’t tell me about it, thereby satisfying M3 (though I don’t know it). It makes a few minor tweaks to my perceived environment, but mostly leaves them alone, since it is already pretty well balanced between M1 and M2 (which is not surprising, since I was responding to those mental structures when I constructed my current situation).
If I’m understanding you correctly, you’re saying that I can’t really know which of these results (or of countless other possibilities) will happen, but that whichever one it is, I should have high confidence that all other possibilities would by my own standards have been worse… after all, that’s what it means to maximize my utility function.
Yes?
It seems to follow that if the AI has an added feature whereby I can ask it to describe what it’s about to do before it does it and then veto doing it, I ought not invoke that feature. (After all, I can’t make the result better, but I might make the result worse.)
Yes?
Assuming you trust Omega to mean the same thing as you do when talking about your preferences and utility function, then yes. If the AI looks over your mind and optimizes the environment for your actual utility function (which could well be a combination of M1, M2 and M3), then any veto you do must make the result worse than the optimal one.
Of course, if there’s doubt about the programming of the AI, use of the veto feature would probably be wise, just in case it’s not a good genie.
You seem to be imagining a relatively weak AI. For instance, given the vast space of possibilities, there are doubtlessly environmental tweaks that would result in more fun on the internet and more high-quality work on the novel. (This is to say nothing of more invasive interventions.)
The answer to your questions is yes: assuming the AI does what Omega says it does, you won’t want to use your veto.
Not necessarily weak overall, merely that it devotes relatively few resources to addressing this particular tiny subset of my preference-space. After all, there are many other things I care about more.
But, sure, a sufficiently powerful optimizer will come up with solutions so much better that it will never even occur to me to doubt that all other possibilities would be worse. And given a sufficiently powerful optimizer, I might as well invoke the preview feature if I feel like it, because I’ll find the resulting preview so emotionally compelling that I won’t want to use my veto.
That case obscures rather than illustrates the question I’m asking, so I didn’t highlight it.
Case 4: The AI makes tweaks to your current environment in order to construct it in accordance with your mental structures, but in a way more efficient than you could have in the first place.
Sure. In which case I still noodle around on the Internet a bunch rather than work on my novel, but at least I can reassure myself that this optimally reflects my real preferences, and any belief I might have that I would actually rather get more work done on my novel than I do is simply an illusion.
If those are, in fact, your real preferences, then sure.