If I considered it high-probability that you could make a change and you were claiming you’d make a change that wouldn’t be be of highly negative utility to everyone else, I might well prepare for that change.
Because your proposed change is highly negative to everyone else, I might well attempt to resist or counteract that change.
Why does that make sense, though? Why do other peoples’ current utility functions count if mine don’t? How does that extend to a situation where you changed everyone else? How does it extend to a situation where I could change everyone else but I don’t have to? If an AI programmed to make its programmer happy does so by directly changing the programmer’s brain to provide a constant mental state of happiness, why is that a bad thing?
The way I’m thinking about it is that other people’s utility functions count (for you, now) because you care about them. There isn’t some universal magic register of things that “count”; there’s just your utility function which lives in your head (near enough). If you fundamentally don’t care about other people’s utility, and there’s no instrumental reason for you to do so, then there’s no way I can persuade you to start caring.
So it’s not so much that caring about other people’s utility “makes sense”, just that you do care about it. Whether the AI is doing a bad thing (from the point of view of the programmer) depends on what the programmer actually cares about. If he wants to climb Mount Everest, then being told that he will be rewired to enjoy just lying on a sofa doesn’t lead to him doing so. He might also care about the happiness of his future self, but it could be that his desire to climb Mount Everest overwhelms that.
You’re saying that present-me’s utility function counts and no-one else’s does (apart from their position in present-me’s function) because present-me is the one making the decision? That my choices must necessarily depend on my present function and only depend on other/future functions in how much I care about their happiness?
That seems reasonable.
But my current utility function tells me that there is an N large enough that N utilon-seconds for other peoples’ functions counts more in my function than any possible thing in the expected lifespan of present-me’s utility function.
Sure. That might well be so. I’m not saying you have to be selfish!
However, you’re talking about utilons for other people—but I doubt that that’s the only thing you care about. I would kind of like for Clippy to get his utilons, but in the process, the world will get turned into paperclips, and I care much more about that not happening! So if everyone were to be turned into paperclip maximizers, I wouldn’t necessarily roll over and say, “Alright, turn the world into paperclips”. Maybe if there were enough of them, I’d be OK with it, as there’s only one world to lose, but it would have to be an awful lot!
I’d consider it. On reflection, I think that for me personally what I care about isn’t just minds of any kind having their preferences satisfied, even if those are harmless ones. I think I probably would like them to have more adventurous preferences! The point is, what I’m looking at here are my preferences for how the world should be; whether I would prefer a world full of wire-headers or one full of people doing awesome actual stuff. I think I’d prefer the latter, even if overall the adventurous people didnt’ get as many of their preferences satisfied. A typical wire-header would probably disagree, though!
If I considered it high-probability that you could make a change and you were claiming you’d make a change that wouldn’t be be of highly negative utility to everyone else, I might well prepare for that change. Because your proposed change is highly negative to everyone else, I might well attempt to resist or counteract that change. Why does that make sense, though? Why do other peoples’ current utility functions count if mine don’t? How does that extend to a situation where you changed everyone else? How does it extend to a situation where I could change everyone else but I don’t have to? If an AI programmed to make its programmer happy does so by directly changing the programmer’s brain to provide a constant mental state of happiness, why is that a bad thing?
The way I’m thinking about it is that other people’s utility functions count (for you, now) because you care about them. There isn’t some universal magic register of things that “count”; there’s just your utility function which lives in your head (near enough). If you fundamentally don’t care about other people’s utility, and there’s no instrumental reason for you to do so, then there’s no way I can persuade you to start caring.
So it’s not so much that caring about other people’s utility “makes sense”, just that you do care about it. Whether the AI is doing a bad thing (from the point of view of the programmer) depends on what the programmer actually cares about. If he wants to climb Mount Everest, then being told that he will be rewired to enjoy just lying on a sofa doesn’t lead to him doing so. He might also care about the happiness of his future self, but it could be that his desire to climb Mount Everest overwhelms that.
You’re saying that present-me’s utility function counts and no-one else’s does (apart from their position in present-me’s function) because present-me is the one making the decision? That my choices must necessarily depend on my present function and only depend on other/future functions in how much I care about their happiness? That seems reasonable. But my current utility function tells me that there is an N large enough that N utilon-seconds for other peoples’ functions counts more in my function than any possible thing in the expected lifespan of present-me’s utility function.
Sure. That might well be so. I’m not saying you have to be selfish!
However, you’re talking about utilons for other people—but I doubt that that’s the only thing you care about. I would kind of like for Clippy to get his utilons, but in the process, the world will get turned into paperclips, and I care much more about that not happening! So if everyone were to be turned into paperclip maximizers, I wouldn’t necessarily roll over and say, “Alright, turn the world into paperclips”. Maybe if there were enough of them, I’d be OK with it, as there’s only one world to lose, but it would have to be an awful lot!
So you, like I, might consider turning the universe into minds that most value a universe filled with themselves?
I’d consider it. On reflection, I think that for me personally what I care about isn’t just minds of any kind having their preferences satisfied, even if those are harmless ones. I think I probably would like them to have more adventurous preferences! The point is, what I’m looking at here are my preferences for how the world should be; whether I would prefer a world full of wire-headers or one full of people doing awesome actual stuff. I think I’d prefer the latter, even if overall the adventurous people didnt’ get as many of their preferences satisfied. A typical wire-header would probably disagree, though!
Fair.